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Enterococcus faecalis Polynucleotides and Polypeptides 



FIELD OF THE INVENTION 

5 The present invention relates to the field of molecular biology. In particular, it 

relates to, among other things, nucleotide sequences of Enterococcus faecalis, contigs, 
ORFs, fragments, probes, primers and related polynucleotides thereof, peptides and 
polypeptides encoded by the sequences, and uses of the polynucleotides and sequences 
thereof, such as in fermentation, polypeptide production, assays and pharmaceutical 

10 development, among others. 

BACKGROUND OF THE INVENTION 

Enterococci have been recognized as being pathogenic for humans since the turn 
of the century when they were first described by Thiercelin in 1988 as microscopic 

15 organisms. The genus Enterococcus includes the species Enterococcus faecalis or E. 
faecalis which is the most common pathogen in the group, accounting for 80 - 90 
percent of all enterococcal infections. See Lewis et al. (1990) Eur J. Clin Microbiol 
Infect Dis. 9:111-117. 

The incidence of enterococcal infections has increased in recent years and 

20 enterococci are now the second most frequently reported nosocomial pathogens. 

Enterococcal infection is of particular concern because of its resistance to antibiotics. 
Recent attention has focused on enterococci not only because of their increasing role in 
nosocomial infections, but also because of their remarkable and increasing resistance to 
antimicrobial agents. These factors are mutually reinforcing since resistance allows 

25 enterococci to survive in an environment in which antimicrobial agents arc heavily used; 
the hospital setting provides the antibiotics which eliminate or suppress susceptible 
bacteria, thereby providing a selective advantage for resistant organisms, and the hospital 
also provides the potential for dissemination of resistant enterococci via the usual routes 
of hand and environmental contamination. 

30 Antimicrobial resistance can be divided into two general types, inherent or 

intrinsic property and that which is acquired. The genes for intrinsic resistance, like 
other species characteristics, appear to reside on the chromosome. Acquired resistance 
results from either a mutation in the existing DNA or acquisition of new DNA. The 
various inherent traits expressed by enterococci include resistance to semisynthetic 

35 penicillinase-resistant penicillins, cephalosporins, low levels of aminoglycosides, and low 
levels of clindamycin. Examples of acquired resistance include resistance to 
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chloramphenicol, erythromycin, high levels of clindamycin, tetracycline, high levels of 
aminoglycosides, penicillin by means of penicillinase, fluoroquinolones, and vancomycin. 
Resistance to high levels of penicillin without penicillinase and resistance to 
fluoroquinolones are not known to be plasmid or transposon mediated and presumably are 
5 due to mutation(s). 

Although the main reservoir for enterococci in humans is the gastrointestinal 
tract, the bacteria can also reside in the gallbladder, urethra and vagina. 

E. faecalis has emerged as an important pathogen in endocarditis, bacteremia, 
urinary tract infections (UTIs), intraabdominal infections, soft tissue infections, and 

10 neonatal sepsis (Lewis 1990, supra). In the 1970s and 1980s enterococci became firmly 
established as major nosocomial pathogens. They are now the fourth leading cause of 
hospital -acquired infection and the third leading cause of bacteremia in the United States. 
Fatality ratios for enterococcal bactermia range from 12% to 68%, with death due to 
enterococcal sepsis in 4 to 50% of these cases. See Emori, T.G. (1993) Clin. Microbiol. 

15 Rev. 6:428-442. 

The ability of enterococci to colonize the gastrointestinal tract, plus the many 
intrinsic and acquired resistance traits, means that these organisms, which usually seem to 
have relatively low intrinsic virulence, are given an excellent opportunity to become 
secondary invaders. Since nosocomial isolates of enterococci have displayed resistance to 

20 essentially every useful antimicrobial agent, it will likely become increasingly difficult to 
successfully treat and control enterococcal infections. Particularly when the various 
resistance genes come together in a single strain, an event almost certain to occur at 
some time in the future. 

The etiology of diseases mediated or exacerbated by Enterococcus faecalis, 

25 involves the programmed expression of E. faecalis genes, and that characterizing these 
genes and their patterns of expression would dramatically add to our understanding of the 
organism and its host interactions. Knowledge of the E. faecalis gene and genomic 
organization would improve our understanding of disease etiology and lead to improved 
and new ways of preventing, treating and diagnosing diseases. Thus, there is a need to 

30 characterize the genome of E. faecalis and for polynucleotides of this organism. 

SUMMARY OF THE INVENTION 

The present invention is based on the sequencing of fragments of the 
Enterococcus faecalis genome. The primary nucleotide sequences which were generated 
35 are provided in SEQ ID NOS: 1-982. 

The present invention provides the nucleotide sequence of hundreds of con tigs of 
the Enterococcus faecalis genome, which arc listed in tables below and set out in the 
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Sequence Listing submitted herewith, and representative fragments thereof, in a form 
which can be readily used, analyzed, and interpreted by a skilled artisan. In one 
embodiment, the present invention is provided as contiguous strings of primary sequence 
information corresponding to the nucleotide sequences depicted in SEQ ID NOS: 1-982. 
5 The present invention further provides nucleotide sequences which are at least 

95%, 96%, 97%, 98%, and 99%, identical to the nucleotide sequences of SEQ ID NOS:l- 
982. 

The nucleotide sequence of SEQ ID NOS: 1-982, a representative fragment 
thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide 

10 sequence of SEQ ID NOS: 1-982 may be provided in a variety of mediums to facilitate its 
use. In one application of this embodiment, the sequences of the present invention are 
recorded on computer readable media. Such media includes, but is not limited to: magnetic 
storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical 
storage media such as CD-ROM; electrical storage media such as RAM and ROM; and 

15 hybrids of these categories such as magnetic/optical storage media. 

The present invention further provides systems, particularly computer-based . 
systems which contain the sequence information herein described stored in a data storage 
means. Such systems are designed to identify commercially important fragments of the 
Enierococcus faecalis genome. 

20 Another embodiment of the present invention is directed to fragments of the 

Enierococcus faecalis genome having particular structural or functional attributes. Such 
fragments of the Enierococcus faecalis genome of the present invention include, but are 
not limited to, fragments which encode peptides, hereinafter referred to as open reading 
frames or ORFs, fragments which modulate the expression of an operably linked ORF, 

25 hereinafter referred to as expression modulating fragments or EMFs, and fragments 
which can be used to diagnose the presence of Enierococcus faecalis in a sample, 
hereinafter referred to as diagnostic fragments or DFs. 

Each of the ORFs in fragments of the Enierococcus faecalis genome disclosed in 
Tables 1-3, and the EMFs found 5* prime of the initiation codon, can be used in numerous 

30 ways as polynucleotide reagents. For instance, the sequences can be used as diagnostic 
probes or amplification primers for detecting or determining the presence of a specific 
microbe in a sample, to selectively control gene expression in a host and in the 
production of polypeptides, such as polypeptides encoded by ORFs of the present 
invention, particular those polypeptides that have a pharmacological activity. 

35 The present invention further includes recombinant constructs comprising one or 

more fragments of the Enierococcus faecalis genome of the present invention. The 
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recombinant constructs of the present invention comprise vectors, such as a plasmid or 
viral vector, into which a fragment of the Enterococcus faecalis has been inserted. 

The present invention further provides host cells containing any of the isolated 
fragments of the Enterococcus faecalis genome of the present invention. The host cells 
5 can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic ceil, 
such as a yeast cell, or a procaryotic cell such as a bacterial cell. 

The present invention is further directed to isolated polypeptides and proteins 
encoded by ORFs of the present invention. A variety of methods, well known to those of 
skill in the art, routinely may be utilized to obtain any of the polypeptides and proteins 

10 of the present invention. For instance, polypeptides and proteins of the present 

invention having relatively short, simple amino acid sequences readily can be synthesized 
using commercially available automated peptide synthesizers. Polypeptides and proteins 
of the present invention also may be purified from bacterial cells which naturally produce 
the protein. Yet another alternative is to purify polypeptide and proteins of the present 

15 invention from cells which have been altered to express them. 

The invention further provides methods of obtaining homologs of the fragments 
of the Enterococcus faecalis genome of the present invention and homologs of the 
proteins encoded by the ORFs of the present invention. Specifically, by using the 
nucleotide and amino acid sequences disclosed herein as a probe or as primers, and 

20 techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can 
obtain homologs. 

The invention further provides antibodies which selectively bind polypeptides and 
proteins of the present invention. Such antibodies include both monoclonal and 
polyclonal antibodies. 

25 The invention further provides hybridomas which produce the above-described 

antibodies. A hybridoma is an immortalized cell line which is capable of secreting a 
specific monoclonal antibody. 

The present invention further provides methods of identifying test samples 
derived from cells which express one of the ORFs of the present invention, or a homolog 
30 thereof. Such methods comprise incubating a test sample with one or more of the 

antibodies of the present invention, or one or more of the DFs of the present invention, 
under conditions which allow a skilled artisan to determine if the sample contains the 
ORF or product produced therefrom. 

In another embodiment of the present invention, kits are provided which contain 
35 the necessary reagents to carry out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in close 
confinement, one* or more containers which comprises: (a) a first container comprising 
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one of the antibodies, or one of the DFs of the present invention; and (b) one or more 
other containers comprising one or more of the following: wash reagents, reagents 
capable of detecting presence of bound antibodies or hybridized DFs. 

Using the isolated proteins of the present invention, the present invention 
5 further provides methods of obtaining and identifying agents capable of binding to a 
polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps of: 
(a)contacting an agent with an isolated protein encoded by one of the ORFs of the 

10 present invention; and (b)determining whether the agent binds to said protein. 

The present genomic sequences of Enterococcus faecal is will be of great value to 
all laboratories working with this organism and for a variety of commercial purposes. 
Many fragments of the Enterococcus faecalis genome will be immediately identified by 
similarity searches against GenBank or protein databases and will be of immediate value to 

1 5 Enterococcus faecalis researchers and for immediate commercial value for the production 
of proteins or to control gene expression. 

The methodology and technology for elucidating extensive genomic sequences of 
bacterial and other genomes has and will greatly enhance the ability to analyze and 
understand chromosomal organization. In particular, sequenced contigs and genomes will 

20 provide the models for developing tools for the analysis of chromosome structure and 
function, including the ability to identify genes within large segments of genomic DNA, 
the structure, position, and spacing of regulatory elements, the identification of genes 
with potential industrial applications, and the ability to do comparative genomic and 
molecular phylogeny. 

25 

DESCRIPTION OF THE FIGURES 

FIGURE. 1 is a block diagram of a computer system (102) that can be used to 
implement computer-based systems of the present invention. 

30 FIGURE 2 is a schematic diagram depicting the data flow and computer programs 

used to collect, assemble, edit and annotate the contigs of the Enterococcus faecalis 
genome of the present invention. Both Macintosh and Unix platforms are used to handle 
the AB 373 and 377 sequence data files, largely as described in Kerlavage et al, 
Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System 

35 Sciences, 585, IEEE Computer Society Press, Washington D.C. (1993). Factura (AB) is a 
Macintosh program designed for automatic vector sequence removal and end-trimming of 
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sequence files. The program Sequis runs on a Macintosh platform and parses the feature 
data extracted from the sequence files by Factura to the Unix based Enter ococcus faecal is 
relational database. Assembly of contigs (and whole genome sequences) is accomplished 
by retrieving a specific set of sequence files and their associated features using Extrscq, a 
5 Unix utility for retrieving sequences from an SQL database. The resulting sequence file is 
processed by seq_filter to trim portions of the sequences with more than 1 % ambiguous 
nucleotides. The sequence files were assembled using TIGR Assembler, an assembly engine 
designed at The Institute for Genomic Research ( TIGR ) for rapid and accurate assembly 
of thousands of sequence fragments. The collection of contigs generated by the assembly 

10 step is loaded into the database with the lassie program. Identification of open reading 
frames (ORFs) is accomplished by processing contigs with GeneMark, described in 
Borodovsky, M. and Mclninch, J.D. (1993) Comput. Chem. y 17:123133. The ORFs are 
searched against E. faecal is sequences from GenBank and against all protein sequences 
using the BLASTN and BLASTP programs, described in Altschul et aL, J. Mol Biol 215: 

15 403-410 (1990)). Results of the ORF determination and similarity searching steps were 
loaded into the database. As described below, some results of the determination and the 
searches are set out in Tables 1-3. 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

20 The present invention is based on the sequencing of fragments of the 

Enterococcus faecal is genome and analysis of the sequences. The primary nucleotide 
sequences generated by sequencing the fragments are provided in SEQ ID NOS: 1-982. 
(As used herein, the "primary sequence" refers to the nucleotide sequence represented by 
the IUPAC nomenclature system.) 

25 In addition to the aforementioned Enterococcus faecalis polynucleotide and 

polynucleotide sequences, the present invention provides the nucleotide sequences of SEQ 
ID NOS: 1-982 , or representative fragments thereof, in a form which can be readily used, 
analyzed, and interpreted by a skilled artisan. 

As used herein, a "representative fragment of the nucleotide sequence depicted in 

30 SEQ ID NOS: 1-982" refers to any portion of the SEQ ID NOS: 1-982 which is not 
presently represented within a publicly available database. Preferred representative 
fragments of the present invention are Enterococcus faecalis open reading frames (ORFs 
), expression modulating fragment ( EMFs ) and fragments which can be used to diagnose 
the presence of Enterococcus faecalis in a sample (DFs ). A non-limiting identification 

35 of preferred representative fragments is provided in Tables 1-3. As discussed in detail 
below, the information provided in SEQ ID NOS: 1-982 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled in the 
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art to clone and sequence all "representative fragments" of interest, including open 
reading frames encoding a large variety of Enterococcus faecalis proteins. 

The present invention is further directed to nucleic acid molecules encoding 
portions or fragments of the nucleotide sequences described herein. Fragments include 
5 portions of the nucleotide sequences of Table 1-3 and SEQ ID NOS: 1-982, at least 10 
contiguous nucleotides in length selected from any two integers, one of which 
representing a 5' nucleotide position and a second of which representing a 3' nucleotide 
position, where the first nucleotide for each nucleotide sequence in SEQ ID NOS: 1-982 is 
position 1 . That is, every combination of a 5' and 3' nucleotide position that a fragment 

10 at least 10 contiguous nucleotides in length could occupy is included in the invention. At 
least means a fragment may be 10 contiguous nucleotide bases in length or any integer 
between 10 and the length of an entire nucleotide sequence of SEQ ID NOS: 1-982 minus 
1. Therefore, included in the invention are contiguous fragments specified by any 5' and 
y nucleotide base positions of a nucleotide sequences of SEQ ID NOS:l-982 wherein the 

15 contiguous fragment is any integer between 10 and the length of an entire nucleotide 
sequence minus 1 . 

Further, the invention includes polynucleotides comprising fragments specified by 
size, in nucleotides, rather than by nucleotide positions. The invention includes any 
fragment size, in contiguous nucleotides, selected from integers between 10 and the length 

20 of an entire nucleotide sequence minus 1 . Preferred sizes of contiguous nucleotide 

fragments include 20 nucleotides, 30 nucleotides, 40 nucleotides, 50 nucleotides. Other 
preferred sizes of contiguous nucleotide fragments, which may be useful as diagnostic 
probes and primers, include fragments 50-300 nucleotides in length which include, as 
discussed above, fragment sizes representing each integer between 50-300. Larger 

25 fragments are also useful according to the present invention corresponding to most, if not 
all, of the nucleotide sequences shown in SEQ ID NOS: 1-982. The preferred sizes are, of 
course, meant to exemplify not limit the present invention as all size fragments, 
representing any integer between 1 0 and the length of an entire nucleotide sequence 
minus 1, of each SEQ ID NO:, are included in the invention. 

30 The present invention also provides for the exclusion of any fragment, specified 

by 5' and 3' base positions or by size in nucleotide bases as described above for any 
nucleotide sequence of SEQ ID NOS: 1-982. Any number of fragments of nucleotide 
sequences in SEQ ID NOS: 1-982, specified by 5' and 3* base positions or by size in 
nucleotides, as described above, may be excluded from the present invention. 

35 While the presently disclosed sequences of SEQ ID NOS: 1-982 are highly 

accurate, sequencing techniques are not perfect and, in relatively rare instances, further 
investigation of a fragment or sequence of the invention may reveal a nucleotide 
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sequence error present in a nucleotide sequence disclosed in SEQ ID NOS: 1-982. 
However, once the present invention is made available (i.e., once the information in SEQ 
ID NOS: 1-982 and Tables 1-3 has been made available), resolving a rare sequencing error 
in SEQ ID NOS: 1-982 will be well within the skill of the art. The present disclosure 
5 makes available sufficient sequence information to allow any of the described contigs or 
portions thereof to be obtained readily by straightforward application of routine 
techniques. Further sequencing of such polynucleotides may proceed in like manner using 
manual and automated sequencing methods which are employed ubiquitous in the art. 
Nucleotide sequence editing software is publicly available. For example, Applied 

10 Biosystem's (AB) AutoAssembler can be used as an aid during visual inspection of 

nucleotide sequences. By employing such routine techniques potential errors readily may 
be identified and the correct sequence then may be ascertained by targeting further 
sequencing effort, also of a routine nature, to the region containing the potential error. 
Even if all of the very rare sequencing errors in SEQ ID NOS: 1-982 were 

15 corrected, the resulting nucleotide sequences would still be at least 95% identical, nearly 
all would be at least 99% identical, and the great majority would be at least 99.9% 
identical to the nucleotide sequences of SEQ ID NOS: 1-982. 
i As discussed elsewhere herein, polynucleotides of the present invention readily 

may be obtained by routine application of well known and standard procedures for cloning 

20 and sequencing DNA. A wide variety of Enterococcus faecalis strains that can be used to 
prepare E. faecalis genomic DNA for cloning and for obtaining polynucleotides of the 
present invention are available to the public from recognized depository institutions, such 
as the American Type Culture Collection (ATCC). . While the present invention is 
enabled by the sequences and other information herein disclosed, the E. faecalis strain 

25 that provided the DNA of the present Sequence Listing, Strain V586, kindly provided by 
Dr. Michael Gilmore, University of Oklahoma, has been deposited in the ATCC, as a 
convenience to those of skill in the art. The E. faecalis strain V586 was deposited 2 May 
1997 at the ATCC, 10801 University Blvd. Manassas, VA 201 10-2209, and given 
accession number 55969. The provision of the deposits is not a waiver of any rights of 

30 the inventors or their assignees in the present subject matter. 

The nucleotide sequences of the genomes from different strains of Enterococcus 
faecalis differ somewhat. However, the nucleotide sequences of the genomes of all 
Enterococcus faecalis strains will be at least 95% identical, in corresponding part, to the 
nucleotide sequences provided in SEQ ID NOS: 1-982. Nearly all will be at least 99% 

35 identical and the great majority will be 99.9% identical. 

The present application is further directed to nucleic acid molecules at least 90%, 
95%, 96%, 97%, 98% or 99% identical to a nucleic acid sequence shown in SEQ ID NOS: 
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1-982. The above nucleic acid sequences are included irrespective of whether they encode 
a polypeptide having E> faecalis activity. This is because even where a particular nucleic 
acid molecule does not encode a polypeptide having E. faecalis activity, one of skill in 
the art would still know how to use the nucleic acid molecule, for instance, as a 
5 hybridization probe. Uses of the nucleic acid molecules of the present invention that do 
not encode a polypeptide having E. faecalis activity include, inter alia, isolating an E. 
faecalis gene or allelic variants thereof from a DNA library, and detecting E. faecalis 
mRNA expression samples, environmental samples, suspected of containing E. faecalis by 
Northern Blot analysis. 

10 Preferred, are nucleic acid molecules having sequences at least 90%, 95%, 96%, 

97%, 98% or 99% identical to the nucleic acid sequence shown in SEQ ID NOS: 1-982, 
which do, in fact, encode a polypeptide having E. faecalis protein activity By "a 
polypeptide having E, faecalis activity" is intended polypeptides exhibiting activity 
similar, but not necessarily identical, to an activity of the E. faecalis protein of the 

15 invention, as measured in a particular biological assay suitable for measuring activity of 
the specified protein. 

Due to the degeneracy of the genetic code, one of ordinary skill in the art will 
immediately recognize that a large number of the nucleic acid molecules having a 
sequence at least 90%, 95%, 96%, 97%, 98%, or 99% identical to the nucleic acid 

20 sequences shown in SEQ ID NOS: 1-982 will encode a polypeptide having E. faecalis 
protein activity. In fact, since degenerate variants of these nucleotide sequences all 
encode the same polypeptide, this will be clear to the skilled artisan even without 
performing the above described comparison assay. It will be further recognized in the art 
that, for such nucleic acid molecules that are not degenerate variants, a reasonable number 

25 will also encode a polypeptide having E. faecalis protein activity. This is because the 
skilled artisan is fully aware of amino acid substitutions that are either less likely or not 
likely to significantly effect protein function (e.g., replacing one aliphatic amino acid 
with a second aliphatic amino acid), as further described below. 

The biological activity or function of the polypeptides of the present invention 

30 arc expected to be similar or identical to polypeptides from other bacteria that share a 
high degree of structural identity/similarity. Tables 1 and 2 lists accession numbers and 
descriptions for the closest matching sequences of polypeptides available through 
Genbank. It is therefore expected that the biological activity or function of the 
polypeptides of the present invention will be similar or identical to those polypeptides 

35 from other bacterial genuses, species, or strains listed in Tables 1 and 2. 

By a polynucleotide having a nucleotide sequence at least, for example, 95% 
"identical" to a reference nucleotide sequence of the present invention, it is intended that 
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the nucleotide sequence of the polynucleotide is identical to the reference sequence 
except that the polynucleotide sequence may include up to five point mutations per each 
100 nucleotides of the reference nucleotide sequence encoding the E. faecalis 
polypeptide. In other words, to obtain a polynucleotide having a nucleotide sequence at 
5 least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the 
reference sequence may be deleted, inserted, or substituted with another nucleotide. The 
query sequence may be an entire sequence shown in SEQ ID NOS: 1-982, the ORF (open 
reading frame), or any fragment specified as described herein. 

As a practical matter, whether any particular nucleic acid molecule or polypeptide 

10 is at least 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of the 
presence invention can be determined conventionally using known computer programs. 
A preferred method for determining the best overall match between a query sequence (a 
sequence of the present invention) and a subject sequence, also referred to as a global 
sequence alignment, can be determined using the FASTDB computer program based on 

15 the algorithm of Brutlag et al. See Brutlag ct al. (1990) Comp. App. Biosci. 6:237-245. 
In a sequence alignment the query and subject sequences are both DNA sequences. An 
RNA sequence can be compared by first converting U's to T's. The result of said global 
sequence alignment is in percent identity. Preferred parameters used in a FASTDB 
alignment of DNA sequences to calculate percent identity are: Matrix=Unitary, k- 

20 tuple=4, Mismatch Penalty=l, Joining Penalty=30, Randomization Group Length=0, 

Cutoff Score=l, Gap Penalty=5, Gap Size Penalty 0.05, Window Size=500 or the lenght 
of the subject nucleotide sequence, whichever is shorter. 

If the subject sequence is shorter than the query sequence because of 5' or 3' 
deletions, not because of internal deletions, a manual correction must be made to the 

25 results. This is because the FASTDB program does not account for 5' and V truncations 
of the subject sequence when calculating percent identity. For subject sequences truncated 
at the 5' or 3' ends, relative to the query sequence, the percent identity is corrected by 
calculating the number of bases of the query sequence that are 5' and 3' of the subject 
sequence, which are not matched/aligned, as a percent of the total bases of the query 

30 sequence. Whether a nucleotide is matched/aligned is determined by results of the 
FASTDB sequence alignment. This percentage is then subtracted from the percent 
identity, calculated by the above FASTDB program using the specified parameters, to 
arrive at a final percent identity score. This corrected score is what is used for the 
purposes of the present invention. Only nucleotides outside the 5' and 3' nucleotides of 

35 the subject sequence, as displayed by the FASTDB alignment, which are not 

matched/aligned with the query sequence, are calculated for the purposes of manually 
adjusting the percent identity score. 
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For example, a 90 nucleotide subject sequence is aligned to a 100 nucleotide query 
sequence to determine percent identity. The deletions occur at the 5' end of the subject 
sequence and therefore, the FASTDB alignment does not show a matched/alignment of 
the first 10 nucleotides at 5' end. The 10 unpaired nucleotides represent 10% of the 
5 sequence (number of nucleotides at the 5' and 3' ends not matched/total number of 
nucleotides in the query sequence) so 10% is subtracted from the percent identity score 
calculated by the FASTDB program. If the remaining 90 nucleotides were perfectly 
matched the final percent identity would be 90%. In another example, a 90 nucleotide 
subject sequence is compared with a 100 nucleotide query sequence. This time the 

10 deletions are internal deletions so that there are no nucleotides on the 5' or 3* of the 
subject sequence which are not matched/aligned with the query. In this case the percent 
identity calculated by FASTDB is not manually corrected. Once again, only nucleotides 
5' and 3' of the subject sequence which are not matched/aligned with the query sequence 
are manually corrected for. No other manual corrections arc to made for the purposes of 

15 the present invention. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS: 1-982, a representative 
fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 

20 preferably at least 99.9% identical to a polynucleotide sequence of SEQ ID NOS: 1-982 
may be "provided" in a variety of mediums to facilitate use thereof. As used herein, 
provided refers to a manufacture, other than an isolated nucleic acid molecule, which 
contains a nucleotide sequence of the present invention; i.e., a nucleotide sequence 
provided in SEQ ID NOS: 1-982 , a representative fragment thereof, or a nucleotide 

25 sequence at least 95%, preferably at least 99% and most preferably at least 99.9% 

identical to a polynucleotide of SEQ ID NOS: 1-982. Such a manufacture provides a large 
portion of the Enterococcus faecalis genome and parts thereof {e.g., a Enterococcus 
faecalis open reading frame (ORF)) in a form which allows a skilled artisan to examine 
the manufacture using means not directly applicable to examining the Enterococcus 

30 faecalis genome or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
readable media" refers to any medium which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, such as 

35 floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as 
CD- ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories, such as magnetic/optical storage media. A skilled artisan can readily 



WO 98/50555 



12 



PCT/US98/08985 



appreciate how any of the presently known computer readable mediums can be used to 
create a manufacture comprising computer readable medium having recorded thereon a 
nucleotide sequence of the present invention. Likewise, it will be clear to those of skill 
how additional computer readable media that may be developed also can be used to create 
5 analogous manufactures having recorded thereon a nucleotide sequence of the present 
invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the presently know 
methods for recording information on computer readable medium to generate 

10 manufactures comprising the nucleotide sequence information of the present invention. 
A variety of data storage structures are available to a skilled artisan for creating a 
computer readable medium having recorded thereon a nucleotide sequence of the present 
invention. The choice of the data storage structure will generally be based on the means 
chosen to access the stored information. In addition, a variety of data processor 

1 5 programs and formats can be used to store the nucleotide sequence information of the 
present invention on computer readable medium. The sequence information can be 
represented in a word processing text file, formatted in commercially- available software 
such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan 

20 can readily adapt any number of data-processor structuring formats (e.g., text file or 
database) in order to obtain computer readable medium having recorded thereon the 
nucleotide sequence information of the present invention. 

Computer software is publicly available which allows a skilled artisan to access 
sequence information provided in a computer readable medium. Thus, by providing in 

25 computer readable form the nucleotide sequences of SEQ ID NOS: 1-982, a representative 
fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 
preferably at least 99.9% identical to a sequence of SEQ ID NOS: 1-982 the present 
invention enables the skilled artisan routinely to access the provided sequence 
information for a wide variety of purposes. 

30 The examples which follow demonstrate how software which implements the 

BLAST (Altschul et at., J. Mol Biol 275:403-410 (1990)) and BLAZE (Brutlag et al. y 
Comp. Chem. 77:203-207 (1993)) search algorithms on a Sybase system was used to 
identify open reading frames (ORFs) within the Enterococcus faecalis genome which 
contain homology to ORFs or proteins from both Enterococcus faecalis and from other 

35 organisms. Among the ORFs discussed herein are protein encoding fragments of the 

Enterococcus faecalis genome useful in producing commercially important proteins, such 
as enzymes used in fermentation reactions and in the production of commercially useful 
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metabolites, proteins to be used as vaccines or in the generation of immuno-therapeutic 
reagents, or as drug screening targets. 

The present invention further provides systems, particularly computer-based 
systems, which contain the sequence information described herein. Such systems are 
5 designed to identify, among other things, commercially important fragments of the 
Enterococcus faecalis genome. 

As used herein, "a computer-based system" refers to the hardware means, software 
means, and data storage means used to analyze the nucleotide sequence information of the 
present invention. The minimum hardware means of the computer-based systems of the 
10 present invention comprises a central processing unit (CPU), input means, output means, 
and data storage means. A skilled artisan can readily appreciate that any one of the 
currently available computer-based system are suitable for use in the present invention. 

As stated above, the computer-based systems of the present invention comprise a 
data storage means having stored therein a nucleotide sequence of the present invention 
15 and the necessary hardware means and software means for supporting and implementing a 
search means. 

As used herein, "data storage means" refers to memory which can store nucleotide 
sequence information of the present invention, or a memory access means which can 
access manufactures having recorded thereon the nucleotide sequence information of the 

20 present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer- based system to compare a target sequence or target 
structural motif with the sequence information stored within the data storage means. 
Search means are used to identify fragments or regions of the present genomic sequences 

25 which match a particular target sequence or target motif. A variety of known algorithms 
are disclosed publicly and a variety of commercially available software for conducting 
search means are and can be used in the computer-based systems of the present invention. 
Examples of such software includes, but is not limited to, MacPattern (EMBL), BLASTN 
and BLASTX (NCBI). A skilled artisan can readily recognize that any one of the 

30 available algorithms or implementing software packages for conducting homology 
searches can be adapted for use in the present computer-based systems. 

As used herein, a "target sequence" can be any DN A or amino acid sequence of six 
or more nucleotides or two or more amino acids. A skilled artisan can readily recognize 
that the longer a target sequence is, the less likely a target sequence will be present as a 

35 random occurrence in the database. The most preferred sequence length of a target 
sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide 
residues. However, it is well recognized that searches for commercially important 
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fragments, such as sequence fragments involved in gene expression and protein 
processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 
5 chosen based on a three-dimensional configuration which is formed upon the folding of 
the target motif. There are a variety of target motifs known in the art. Protein target 
motifs include, but are not limited to, enzymic active sites and signal sequences. Nucleic 
acid target motifs include, but are not limited to, promoter sequences, hairpin structures 
and inducible expression elements (protein binding sequences). 

10 A variety of structural formats for the input and output means can be used to 

input and output the information in the computer-based systems of the present 
invention. A preferred format for an output means ranks fragments of the Enterococcus 
faecalis genomic sequences possessing varying degrees of homology to the target sequence 
or target motif. Such presentation provides a skilled artisan with a ranking of sequences 

15 which contain various amounts of the target sequence or target motif and identifies the 
degree of homology contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or target 
motif with .the data storage means to identify sequence fragments of the Enterococcus 
faecalis genome. In the present examples, implementing software which implement the 

20 BLAST algorithm, described in Altschul et al (1990) J. Mol Biol 215: 403-410, is used 
to identify open reading frames within the Enterococcus faecalis genome. A skilled 
artisan can readily recognize that any one of the publicly available homology search 
programs can be used as the search means for the computer-based systems of the present 
invention. Of course, suitable proprietary systems that may be known to those of skill 

25 also may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of 
embodiments of this aspect of present invention. The computer system 102 includes a 
processor 106 connected to a bus 104. Also connected to the bus 104 are a main 
memory 108 (preferably implemented as random access memory, RAM) and a variety of 

30 secondary storage devices 110, such as a hard drive 112 and a removable medium storage 
device 1 14. The removable medium storage device 1 14 may represent, for example, a 
floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage 
medium 116 (such as a floppy disk, a compact disk, a magnetic tape, etc.) containing 
control logic and/or data recorded therein may be inserted into the removable medium 

35 storage device 1 14. The computer system 102 includes appropriate software for reading 
the control logic and/or the data from the removable medium storage device 1 1 4, once it 
is inserted into the removable medium storage device 1 14. 
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A nucleotide sequence of the present invention may be stored in a well known 
manner in the main memory 108, any of the secondary storage devices 1 10, and/or a 
removable storage medium 1 1 6. During execution, software for accessing and processing 
the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 
5 108, in accordance with the requirements and operating parameters of the operating 
system, the hardware system and the software program or programs. 

BIOCHEMICAL EMBODIMENTS 

Other embodiments of the present invention are directed to isolated fragments of 

10 the Enterococcus faecalis genome. The fragments of the Enterococcus faecalis genome 
of the present invention include, but are not limited to fragments which encode peptides, 
hereinafter open reading frames (ORFs), fragments which modulate the expression of an 
operably linked ORF, hereinafter expression modulating fragments (EMFs) and fragments 
which can be used to diagnose the presence of Enterococcus faecalis in a sample, 

15 hereinafter diagnostic fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the 
Enterococcus faecalis genome" refers to a nucleic acid molecule possessing a specific 
nucleotide sequence which has been subjected to purification means to reduce, from the 
composition, the number of compounds which are normally associated with the 

20 composition. Particularly, the term refers to the nucleic acid molecules having the 

sequences set out in SEQ ID NOS: 1-982, to representative fragments thereof as described 
above, to polynucleotides at least 95%, preferably at least 99% and especially preferably 
at least 99.9% identical in sequence thereto, also as set out above. 

A variety of purification means can be used to generate the isolated fragments of 

25 the present invention. These include, but are not limited to methods which separate 
constituents of a solution based on charge, solubility, or size. 

In one embodiment, Enterococcus faecalis DNA can be enzymatically sheared to 
produce fragments of 15-20 kb in length. These fragments can then be used to generate a 
Enterococcus faecalis library by inserting them into lambda clones as described in the 

30 Examples below. Primers flanking, for example, an ORF, such as those enumerated in 
Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ 
ID NOS: 1-982. Well known and routine techniques of PCR cloning then can be used to 
isolate the ORF from the lambda DNA library or Enterococcus faecalis genomic DNA. 
Thus, given the availability of SEQ ID NOS: 1-982, the information in Tables 1, 2 and 3, 

35 and the information that may be obtained readily by analysis of the sequences of SEQ ID 
NOS: 1-982 using methods set out above, those of skill will be enabled by the present 
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disclosure to isolate any ORF-containing or other nucleic acid fragment of the present 
invention. 

The isolated nucleic acid molecules of the present invention include, but are not 
limited to single stranded and double stranded DNA, and single stranded RNA. As used 
5 herein, an "open reading frame," ORF, means a series of triplets coding for amino acids 
without any termination codons and is a sequence translatable into protein. Each 
sequence of SEQ ID NOS: 1-982, however, begins and ends with a termination codon. For 
purposes of numbering and reference to polynucleotide and polypeptide sequences the 
entire sequence of each sequence of SEQ ID NOS: 1-982 is included with the first 
10 nucleotide being positon 1. Therefore, for reference purposes the numbering used in the 
present invention is that provided in the sequence listing for SEQ ID NOS: 1-982. 

Tables 1, 2, and 3 list ORFs in the Enterococcus faecalis genomic contigs of the 
present invention that were identified as putative coding regions by the GeneMark 
software using organism-specific second-order Markov probability transition matrices. It 
15 will be appreciated that other criteria can be used, in accordance with well known 
analytical methods, such as those discussed herein, to generate more inclusive, more 
restrictive, or more selective lists. 

Table 1 sets out ORFs in the Enterococcus faecalis contigs of the present 
invention that over a continuous region of at least 50 bases arc 95% or more identical (by 
20 BLAST analysis) to a nucleotide sequence available through GenBank in March, 1997. 

Table 2 sets out ORFs in the Enterococcus faecalis contigs of the present 
invention that are not in Table 1 and match, with a BLASTP probability score of 0.01 or 
less, a polypeptide sequence available through GenBank in March, 1997. 

Table 3 sets out ORFs in the Enterococcus faecalis contigs of the present 
25 invention that do not match significantly, by BLASTP analysis, a polypeptide sequence 
available through GenBank in March, 1997. 

In each table, the first and second columns identify the ORF by, respectively, 
contig number and ORF number within the contig; the third column indicates the 
coordinate of the first nucleotide of the ORF, counting from the 5' end of the contig 
30 strand; the fourth column indicates the coordinate of the final nucleotide of the ORF, 
counting from the 5* end of the contig strand. 

In Tables 1 and 2, column five lists the Reference for the closest matching 
sequence available through GenBank. These reference numbers are the database entry 
numbers commonly used by those of skill in the art, who will be familiar with their 
35 denominators. Descriptions of the nomenclature are available from the National Center 
for Biotechnology Information. Column six in Tables 1 and 2 provides the gene name of 
the matching sequence. 
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In Table 1, column seven provides the nucleotide BLAST percent identity score 
from the comparison of the ORF and the GenBank sequence, column eight indicates the 
length in nucleotides of the highest scoring segment pair identified by the BLAST identity 
analysis, and column nine provides the total length of the ORF in nucleotides. 
5 In Table 2, column seven provides the protein BLAST percent similarity of the 

highest scoring segment pair identified, column eight provides the percent identity of the 
highest scoring segment pair, and column nine provides the total length of the ORF in 
nucleotides. 

The concepts of percent identity and percent similarity of two polypeptide 

10 sequences is well understood in the art. For example, two polypeptides 10 amino acids in 
length which differ at three amino acid positions (e.g., at positions 1, 3 and 5) are said to 
have a percent identity of 70%. However, the same two polypeptides would be deemed to 
have a percent similarity of 80% if, for example at position 5, the amino acids moieties, 
although not identical, were "similar" (i.e., possessed similar biochemical characteristics). 

15 Many programs for analysis of nucleotide or amino acid sequence similarity, such as fasta 
and BLAST specifically list percent identity of a matching region as an output parameter. 
Thus, for instance, Tables 1 and 2 herein enumerate the percent identity of the highest 
scoring segment pair in each ORF and its listed relative. Further details concerning the 
algorithms and criteria used for homology searches are provided below and are described in 

20 the pertinent literature highlighted by the citations provided below. 

It will be appreciated that other criteria can be used to generate more inclusive 
and more exclusive listings of the types set out in the tables. As those of skill will 
appreciate, narrow and broad searches both are useful. Thus, a skilled artisan can readily 
identify ORFs in con tigs of the Enterococcus faecal is genome other than those listed in 

25 Tables 1-3, such as ORFs which are overlapping or encoded by the opposite strand of an 
identified ORF in addition to those ascertainable using the computer-based systems of the 
present invention. 

As used herein, an "expression modulating fragment," EMF, means a series of 
nucleotide molecules which modulates the expression of an operably linked ORF or EMF. 

30 As used herein, a sequence is said to "modulate the expression of an operably 

linked sequence" when the expression of the sequence is altered by the presence of the 
EMF. EMFs include, but are not limited to, promoters, and promoter modulating 
sequences (inducible elements). One class of EMFs are fragments which induce the 
expression or an operably linked ORF in response to a specific regulatory factor or 

35 physiological event. 

EMF sequences can be identified within the contigs of the Enterococcus faecalis 
genome by their proximity to the ORFs provided in Tables 1-3. An intergenic segment, 
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or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, 
taken from any one of the ORFs of Tables 1-3 will modulate the expression of an 
operably linked ORF in a fashion similar to that found with the naturally linked ORF 
sequence. As used herein, an "intergenic segment" refers to fragments of the 
5 Enterococcus faecalis genome which are between two ORF(s) herein described. EMFs also 
can be identified using known EMFs as a target sequence or target motif in the computer- 
based systems of the present invention. Further, the two methods can be combined and 
used together. 

The presence and activity of an EMF can be confirmed using an EMF trap vector. 
10 An EMF trap vector contains a cloning site linked to a marker sequence. A marker 
sequence encodes an identifiable phenotype, such as antibiotic resistance or a 
complementing nutrition auxotrophic factor, which can be identified or assayed when the 
EMF trap vector is placed within an appropriate host under appropriate conditions. As 
described above, a EMF will modulate the expression of an operably linked marker 
15 sequence. A more detailed discussion of various marker sequences is provided below. 

A sequence which is suspected as being an EMF is cloned in all three reading 
frames in one or more restriction sites upstream from the marker sequence in the EMF 
trap vector. The vector is then transformed into an appropriate host using known 
procedures and the phenotype of the transformed host in examined under appropriate 
20 conditions. As described above, an EMF will modulate the expression of an operably 
linked marker sequence. 

As used herein, a "diagnostic fragment," DF, means a series of nucleotide 
molecules which selectively hybridize to Enterococcus faecalis sequences. DFs can be 
readily identified by identifying unique sequences within contigs of the Enterococcus 
25 faecalis genome, such as by using well-known computer analysis software, and by 

generating and testing probes or amplification primers consisting of the DF sequence in 
an appropriate diagnostic format which determines amplification or hybridization 
selectivity. 

The sequences falling within the scope of the present invention arc not limited to 
30 the specific sequences herein described, but also include allelic and species variations 
thereof. Allelic and species variations can be routinely determined by comparing the 
sequences provided in SEQ ID NOS: 1-982, a representative fragment thereof, or a 
nucleotide sequence at least 99% and preferably 99.9% identical to SEQ ID NOS: 1-982, 
with a sequence from another isolate of the same species. Furthermore, to accommodate 
35 codon variability, the invention includes nucleic acid molecules coding for the same 
amino acid sequences as do the specific ORFs disclosed herein. In other words, in the 
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coding region of an ORF, substitution of one codon for another which encodes the same 
amino acid is expressly contemplated. 

Any specific sequence disclosed herein can be readily screened for errors by 
resequencing a particular fragment, such as an ORF, in both directions (i.e., sequence both 
5 strands). Alternatively, error screening can be performed by sequencing corresponding 
polynucleotides of Enterococcus faecalis origin isolated by using part or all of the 
fragments in question as a probe or primer. 

Each of the ORFs of the Enterococcus faecalis genome disclosed in Tables 1, 2 
and 3, and the EMFs found 5 to the ORFs, can be used as polynucleotide reagents in 

10 numerous ways. For example, the sequences can be used as diagnostic probes or diagnostic 
amplification primers to detect the presence of a specific microbe in a sample, 
particularly Enterococcus faecalis. Especially preferred in this regard arc ORFs such as 
those of Table 3, which do not match previously characterized sequences from other 
organisms and thus are most likely to be highly selective for Enterococcus faecalis. Also 

15 particularly preferred are ORFs that can be used to distinguish between strains of 

Enterococcus faecalis y particularly those that distinguish medically important strain, such 
as drug-resistant strains. 

In addition, the fragments of the present invention, as broadly described, can be 
used to control gene expression through triple helix formation or antisense DNA or RNA, 

20 both of which methods are based on the binding of a polynucleotide sequence to DNA or 
RNA. Triple helix- formation optimally results in a shut-off of RNA transcription from 
DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into 
polypeptide. Information from the sequences of the present invention can be used to 
design antisense and triple helix-forming oligonucleotides. Polynucleotides suitable for 

25 use in these methods are usually 20 to 40 bases in length and are designed to be 
complementary to a region of the gene involved in transcription, for triple-helix 
formation, or to the mRNA itself, for antisense inhibition. Both techniques have been 
demonstrated to be effective in model systems, and the requisite techniques are well 
known and involve routine procedures. Triple helix techniques are discussed in, for 

30 example, Lee et a/., Nucl Acids Res. 5:3073 (1979); Cooney et a/., Science 247:456 

(1988); and Dervan et al. y Science 257:1360 (1991). Antisense techniques in general are 
discussed in, for instance, Okano, 7. Neurochem. 5d:560 (1991) and 
Ol igodeoxynucleo tides as Antisense Inhibitors of Gene Express ion , CRC Press, Boca 
Raton, FL (1988)). 

35 The present invention further provides recombinant constructs comprising one or 

more fragments of the Enterococcus faecalis genomic fragments and contigs of the 
present invention. Certain preferred recombinant constructs of the present invention 
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comprise a vector, such as a plasmid or viral vector, into which a fragment of the 
Enterococcus faecalis genome has been inserted, in a forward or reverse orientation. In 
the case of a vector comprising one of the ORFs of the present invention, the vector 
may further comprise regulatory sequences, including for example, a promoter, operably 
5 linked to the ORF. For vectors comprising the EMFs of the present invention, the 

vector may further comprise a marker sequence or heterologous ORF operably linked to 
the EMF. 

Large numbers of suitable vectors and promoters are known to those of skill in 
the art and are commercially available for generating the recombinant constructs of the 

10 present invention. The following vectors are provided by way of example. Useful 

bacterial vectors include phagescript, PsiX174, pBS SK (+ or -), pBS KS (+ or -), pNH8a, 
pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223-3, pKK233- 
3, pDR540, pRIT5 (available from Pharmacia). Useful cukaryotic vectors include 
pWLneo, pSV2cat, pOG44, pXTl, pSG (available from Stratagene) pSVK3, pBPV, pMSG, 

15 pSVL (available from Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. Two 
appropriate vectors are pKJC232-8 and pCM7. Particular named bacterial promoters 
include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV 

20 immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and 
mouse metallothionein- I. Selection of the appropriate vector and promoter is well 
within the level of ordinary skill in the art. 

The present invention further provides host cells containing any one of the 
isolated fragments of the Enterococcus faecalis genomic fragments and contigs of the 

25 present invention, wherein the fragment has been introduced into the host cell using 
known methods. The host cell can be a higher eukaryotic host cell, such as a 
mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or a procaryotic cell, 
such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant construct 

30 comprising an ORF of the present invention, may be introduced into the host by a 
variety of well established techniques that are standard in the art, such as calcium 
phosphate transfection, DEAE, dextran mediated transfection and electroporation, which 
are described in, for instance, Davis, L. et aL, BASIC METHODS IN MOLECULAR 
BIOLOGY (1986). 

35 A host cell containing one of the fragments of the Enterococcus faecalis genomic 

fragments and contigs of the present invention, can be used in conventional manners to 
produce the gene product encoded by the isolated fragment (in the case of an ORF) or can 
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be used to produce a heterologous protein under the control of the EMF. The 
present invention further provides isolated polypeptides encoded by the nucleic acid 
fragments of the present invention or by degenerate variants of the nucleic acid 
fragments of the present invention. By "degenerate variant" is intended nucleotide 
5 fragments which differ from a nucleic acid fragment of the present invention (e.g., an 
ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an 
identical polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs depicted 
in Tables 2 and 3 which encode proteins. 

10 A variety of methodologies known in the art can be utilized to obtain any one of 

the isolated polypeptides or proteins of the present invention. At the simplest level, the 
amino acid sequence can be synthesized using commercially available peptide synthesizers, 
This is particularly useful in producing small peptides and fragments of larger 
polypeptides. Such short fragments as may be obtained most readily by synthesis are 

1 5 useful, for example, in generating antibodies against the native polypeptide, as discussed 
further below. 

In an alternative method, the polypeptide or protein is purified from bacterial 
cells which naturally produce the polypeptide or protein. One skilled in the art can 
readily employ well-known methods for isolating polypeptides and proteins to isolate and 
20 purify polypeptides or proteins of the present invention produced naturally by a bacterial 
strain, or by other methods. Methods for isolation and purification that can be employed 
in this regard include, but are not limited to, immunocbromatography, HPLC, size- 
exclusion chromatography, ion-exchange chromatography, and immuno-affinity 
chromatography. 

25 The polypeptides and proteins of the present invention also can be purified from 

cells which have been altered to express the desired polypeptide or protein. Preferred 
polypeptides and proteins of the present invention are polypeptides and proteins coded 
for by the polynucleotides of SEQ ID NOS: 1-982, wherein the polypeptides and proteins 
are coded in the same frame as the termination codon at the end of each sequence of SEQ 

30 ID NOS: 1-982. As used herein, a cell is said to be altered to express a desired polypeptide 
or protein when the cell, through genetic manipulation, is made to produce a polypeptide 
or protein which it normally does not produce or which the cell normally produces at a 
lower level. Those skilled in the art can readily adapt procedures for introducing and 
expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells 

35 in order to generate a cell which produces one of the polypeptides or proteins of the 
present invention. 
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The polypeptides of the present invention are preferably provided in an isolated 
form, and preferably are substantially purified. A recombinantly produced version of the 
E. faecalis polypeptide can be substantially purified by the one-step method described by 
Smith et al. (1988) Gene 67:31-40. Polypeptides of the invention also can be purified 
5 from natural or recombinant sources using antibodies directed against the polypeptides of 
the invention in methods which are well known in the art of protein purification. 

The invention further provides for isolated E. faecalis polypeptides comprising 
an amino acid sequence selected from the group including: (a) the amino acid sequence of 
a full-length E. faecalis polypeptide having the complete amino acid sequence from the 

10 first methionine codon to the termination codon of each sequence listed in SEQ ID 
NOS: 1-982, wherein said termination codon is at the end of each SEQ ID NO: and said 
first methionine is the first methionine in frame with said termination codon; and (b) the 
amino acid sequence of a full-length E. faecalis polypeptide having the complete amino 
acid sequence in (a) excepting the N-terminal methionine. 

15 The polypeptides of the present invention also include polypeptides having an 

amino acid sequence at least 80% identical, more preferably at least 90% identical, and 
still more preferably 95%, 96%, 97%, 98% or 99% identical to those described in (a) and 
(b) above. 

The present invention is further directed to polynucleotide encoding portions or 

20 fragments of the amino acid sequences described herein as well as to portions or fragments 
of the isolated amino acid sequences described herein. Fragments include portions of the 
amino acid sequences described herein, are at least 5 contiguous amino acid in length, are 
selected from any two integers, one of which representing a N-terminal position. The 
initiation codon of the polypeptides of the present inventions position 1. The initiation 

25 codon (positon 1 ) for purposes of the present invention is the first methionine codon of 
each sequence of SEQ ID NOS: 1-982 which is in frame with the termination codon at the 
end of each said sequence. Every combination of a N-terminal and C-terminal position 
that a fragment at least 5 contiguous amino acid residues in length could occupy, on any 
given amino acid sequence encoded by a sequence of SEQ ID NOS: 1-982 is included in the 

30 invention, i.e., from initiation codon up to the termination codon. At least means a 

fragment may be 5 contiguous amino acid residues in length or any integer between 5 and 
the number of residues in a full length amino acid sequence minus 1 . Therefore, included 
in the invention are contiguous fragments specified by any N -terminal and C-terminal 
positions of amino acid sequence set forth in SEQ ID NOS: 1-982 wherein the contiguous 

35 fragment is any integer between 5 and the number of residues in a full length sequence 
minus 1. 
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Further, the invention includes polypeptides comprising fragments specified by 
size, in amino acid residues, rather than by N-terminal and C-terminal positions. The 
invention includes any fragment size, in contiguous amino acid residues, selected from 
integers between 5 and the number of residues in a full length sequence minus 1 . Preferred 
5 sizes of contiguous polypeptide fragments include about 5 amino acid residues, about 10 
amino acid residues, about 20 amino acid residues, about 30 amino acid residues, about 40 
amino acid residues, about 50 amino acid residues, about 100 amino acid residues, about 
200 amino acid residues, about 300 amino acid residues, and about 400 amino acid 
residues. The preferred sizes are, of course, meant to exemplify, not limit, the present 

10 invention as all size fragments representing any integer between 5 and the number of 
residues in a full length sequence minus 1 are included in the invention. The present 
invention also provides for the exclusion of any fragments specified by N-terminal and C- 
terminal positions or by size in amino acid residues as described above. Any number of 
fragments specified by N-terminal and C-terminal positions or by size in amino acid 

15 residues as described above may be excluded. 

The above fragments need not be active since they would be useful, for example, 
in immunoassays, in epitope mapping, epitope tagging, to generate antibodies to a 
particular portion of the protein, as vaccines, and as molecular weight markers. 

Further polypeptides of the present invention include polypeptides which have at 

20 least 90% similarity, more preferably at least 95% similarity, and still more preferably at 
least 96%, 97%, 98% or 99% similarity to those described above. 

A further embodiment of the invention relates to a polypeptide which comprises 
the amino acid sequence of a E. faecalis polypeptide having an amino acid sequence which 
contains at least one conservative amino acid substitution, but not more than 50 

25 conservative amino acid substitutions, not more than 40 conservative amino acid 

substitutions, not more than 30 conservative amino acid substitutions, and not more than 
20 conservative amino acid substitutions. Also provided are polypeptides which comprise 
the amino acid sequence of a E, faecalis polypeptide, having at least one, but not more 
than 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 conservative amino acid substitutions. 

30 By a polypeptide having an amino acid sequence at least, for example, 95% 

"identical" to a query amino acid sequence of the present invention, it is intended that the 
amino acid sequence of the subject polypeptide is identical to the query sequence except 
that the subject polypeptide sequence may include up to five amino acid alterations per 
each 1 00 amino acids of the query amino acid sequence. In other words, to obtain a 

35 polypeptide having an amino acid sequence at least 95% identical to a query amino acid 
sequence, up to 5% of the amino acid residues in the subject sequence may be inserted, 
deleted, (indels) or substituted with another amino acid. These alterations of the 
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reference sequence may occur at the amino or carboxy terminal positions of the 
reference amino acid sequence or anywhere between those terminal positions, interspersed 
either individually among residues in the reference sequence or in one or more contiguous 
groups within the reference sequence. 
5 As a practical matter, whether any particular polypeptide is at least 90%, 95%, 

96%, 97%, 98% or 99% identical to the amino acid sequences encoded by the sequences 
of SEQ ID NOS: 1-982, as described hererin, can be determined conventionally using 
known computer programs. A preferred method for determining the best overall match 
between a query sequence (a sequence of the present invention) and a subject sequence, 

10 also referred to as a global sequence alignment, can be determined using the FASTDB 
computer program based on the algorithm of Brutlag et al., (1990) Comp. App. Biosci. 
6:237-245. In a sequence alignment the query and subject sequences are both amino acid 
sequences. The result of said global sequence alignment is in percent identity. Preferred 
parameters used in a FASTDB amino acid alignment are: Matrix=PAM 0, k-tuple=2, 

15 Mismatch Penal ty=l, Joining Penal ty=20, Randomization Group Length=0, Cutoff 

Scorc=l, Window Sizc=sequence length, Gap Penalty=5, Gap Size Penalty=0.05, Window 
Size=500 or the length of the subject amino acid sequence, whichever is shorter. 

If the subject sequence is shorter than the query sequence due to N- or C-terminal 
deletions, not because of internal deletions, the results, in percent identity, must be 

20 manually corrected. This is because the FASTDB program does not account for N- and C- 
terminal truncations of the subject sequence when calculating global percent identity. For 
subject sequences truncated at the N- and C-termini, relative to the query sequence, the 
percent identity is corrected by calculating the number of residues of the query sequence 
that are N- and C-terminal of the subject sequence, which are not matched/aligned with a 

25 corresponding subject residue, as a percent of the total bases of the query sequence. 

Whether a residue is matched/aligned is determined by results of the FASTDB sequence 
alignment. This percentage is then subtracted from the percent identity, calculated by 
the above FASTDB program using the specified parameters, to arrive at a final percent 
identity score. This final percent identity score is what is used for the purposes of the 

30 present invention. Only residues to the N- and C-termini of the subject sequence, which 
are not matched/aligned with the query sequence, are considered for the purposes of 
manually adjusting the percent identity score. That is, only query amino acid residues 
outside the farthest N- and C-terminal residues of the subject sequence. 

For example, a 90 amino acid residue subject sequence is aligned with a 100 residue 

35 query sequence to determine percent identity. The deletion occurs at the N -terminus of 
the subject sequence and therefore, the FASTDB alignment does not match/align with the 
first 10 residues at the N-terminus. The 10 unpaired residues represent 10% of the 
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sequence (number of residues at the N- and C- termini not matched/total number of 
residues in the query sequence) so 1 0% is subtracted from the percent identity score 
calculated by the FASTDB program. If the remaining 90 residues were perfectly matched 
the final percent identity would be 90%. In another example, a 90 residue subject 
5 sequence is compared with a 100 residue query sequence. This time the deletions are 

internal so there are no residues at the N- or C-termini of the subject sequence which arc 
not matched/aligned with the query. In this case the percent identity calculated by 
FASTDB is not manually corrected. Once again, only residue positions outside the N- and 
C-terminal.ends of the subject sequence, as displayed in the FASTDB alignment, which 

10 are not matched/aligned with the query sequence are manually corrected. No other 
manual corrections are to made for the purposes of the present invention. 

The above polypeptide sequences are included irrespective of whether they have 
their normal biological activity. This is because even where a particular polypeptide 
molecule does not have biological activity, one of skill in the art would still know how to 

15 use the polypeptide, for instance, as a vaccine or to generate antibodies. Other uses of 
the polypeptides of the present invention that do not have E. faecal is activity include, 
inter alia, as epitope tags, in epitope mapping, and as molecular weight markers on 
SDS-PAGE gels or on molecular sieve gel filtration columns using methods known to 
those of skill in the art. 

20 As described below, the polypeptides of the present invention can also be used to 

raise polyclonal and monoclonal antibodies, which are useful in assays for detecting E. 
faecal is protein expression or as agonists and antagonists capable of enhancing or 
inhibiting E. faecalis protein function. Further, such polypeptides can be used in the 
yeast two-hybrid system to "capture" E. faecalis protein binding proteins which are also 

25 candidate agonists and antagonists according to the present invention. See, e.g., Fields et 
al. (1989) Nature 340:245-246. 

Any host/vector system can be used to express one or more of the ORFs of the 
present invention. These include, but are not limited to, eukaryotic hosts such as HeLa 
cells, CV- 1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B. 

30 subtilis. The most preferred cells are those which do not normally express the particular 
polypeptide or protein or which expresses the polypeptide or protein at low natural level. 

"Recombinant," as used herein, means that a polypeptide or protein is derived 
from recombinant (e.g., microbial or mammalian) expression systems. "Microbial" refers 
to recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) 

35 expression systems. As a product, "recombinant microbial" defines a polypeptide or 

protein essentially free of native endogenous substances and unaccompanied by associated 
native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, e.g., 
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E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in 
yeast will have a glycosylation pattern different from that expressed in mammalian cells. 

"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. 
Generally, DNA segments encoding the polypeptides and proteins provided by this 
5 invention are assembled from fragments of the Enterococcus faecalis genome and short 
oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene 
which is capable of being expressed in a recombinant transcriptional unit comprising 
regulatory elements derived from a microbial or viral operon. 

Recombinant expression vehicle or "vector" refers to a plasmid or phage or virus 

10 or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression 
vehicle can comprise a transcriptional unit comprising an assembly of (1) a genetic 
regulatory elements necessary for gene expression in the host, including elements required 
to initiate and maintain transcription at a level sufficient for suitable expression of the 
desired polypeptide, including, for example, promoters and, where necessary, an enhancer 

15 and a polyadenylation signal; (2) a structural or coding sequence which is transcribed into 
mRNA and translated into protein, and (3) appropriate signals to initiate translation at 
the beginning of the desired coding region and terminate translation at its end. Structural 
units intended for use in yeast or eukaryotic expression systems preferably include a 
leader sequence enabling extracellular secretion of translated protein by a host cell. 

20 Alternatively, where recombinant protein is expressed without a leader or transport 

sequence, it may include an N-terminal methionine residue. This residue may or may not 
be subsequently cleaved from the expressed recombinant protein to provide a final 
product. 

"Recombinant expression system" means host cells which have stably integrated a 
25 recombinant transcriptional unit into chromosomal DNA or carry the recombinant 
transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. 
Recombinant expression systems as defined herein will express heterologous polypeptides 
or proteins upon induction of the regulatory elements linked to the DNA segment or 
synthetic gene to be expressed. 
30 Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other 

cells under the control of appropriate promoters. Cell-free translation systems can also 
be employed to produce such proteins using RNAs derived from the DNA constructs of 
the present invention. Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described in Sambrook et aL, Molecular Cloning: A 
35 Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring 

Harbor, New York (1989), the disclosure of which is hereby incorporated by reference in 
its entirety. 
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Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting transformation of the host cell, e.g., the ampicillin 
resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a 
highly expressed gene to direct transcription of a downstream structural sequence. Such 
5 promoters can be derived from operons encoding glycolytic enzymes such as 3- 

phosphoglycerate kinase (PGK), alpha-factor, acid phosphatase, or heat shock proteins, 
among others. The heterologous structural sequence is assembled in appropriate phase 
with translation initiation and termination sequences, and preferably, a leader sequence 
capable of directing secretion of translated protein into the periplasmic space or 

10 extracellular medium. Optionally, the heterologous sequence can encode a fusion protein 
including an N-terminal identification peptide imparting desired characteristics, e.g., 
stabilization or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 

15 termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to 
ensure maintenance of the vector and, when desirable, provide amplification within the 
host. 

Suitable prokaryotic hosts for transformation include strains of E. coli, B. subtilis, 
20 Salmonella typhimurium and various species within the genera Pseudomonas and 
Streptomyces. Others may, also be employed as a matter of choice. 

As a representative but non-limiting example, useful expression vectors for 
bacterial use can comprise a selectable marker and bacterial origin of replication derived 
from commercially available plasmids comprising genetic elements of the well known 
25 cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
pKK223-3 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 
(available from Promega Biotec, Madison, WI, USA). These pBR322 "backbone" 
sections are combined with an appropriate promoter and the structural sequence to be 
expressed. 

30 Following transformation of a suitable host strain and growth of the host strain to 

an appropriate cell density, the selected promoter, where it is inducible, is derepressed or 
induced by appropriate means {e.g., temperature shift or chemical induction) and cells are 
cultured for an additional period to provide for expression of the induced gene product. 
Thereafter cells are typically harvested, generally by centrifugation, disrupted to release 

35 expressed protein, generally by physical or chemical means, and the resulting crude 
extract is retained for further purification. 
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Various mammalian cell culture systems can also be employed to express 
recombinant protein. Examples of mammalian expression systems include the COS-7 
lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:115 (1981), and other 
cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, 
5 HeLa and BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, polyadcnylation 
site, splice donor and acceptor sites, transcriptional termination sequences, and 5 
flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, 
10 for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites 
may be used to provide the required nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is usually 
isolated by initial extraction from cell pellets, followed by one or more salting-out, 
aqueous ion exchange or size exclusion chromatography steps. Microbial cells employed 
15 in expression of proteins can be disrupted by any convenient method, including freeze- 
thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Protein 
refolding steps can be used, as necessary, in completing configuration of the mature 
protein. Finally, high performance liquid chromatography (HPLC) can be employed for 
final purification steps. 

20 The present invention further includes isolated polypeptides, proteins and nucleic 

acid molecules which are substantially equivalent to those herein described. As used 
herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, 
for example a mutant sequence, that varies from a reference sequence by one or more 
substitutions, deletions, or additions, the net effect of which does not result in an adverse 

25 functional dissimilarity between reference and subject sequences. For purposes 1 of the 
present invention, sequences having equivalent biological activity, and equivalent 
expression characteristics arc considered substantially equivalent. For purposes of 
determining equivalence, truncation of the mature sequence should be disregarded. 

The invention further provides methods of obtaining homologs from other 

30 strains of Enterococcus faecalis, of the fragments of the Enterococcus faecalis genome of 
the present invention and homologs of the proteins encoded by the ORFs of the present 
invention. As used herein, a sequence or protein of Enterococcus faecalis is defined as a 
homolog of a fragment of the Enterococcus faecalis fragments or contigs or a protein 
encoded by one of the ORFs of the present invention, if it shares significant homology to 

35 one of the fragments of the Enterococcus faecalis genome of the present invention or a 
protein encoded by one of the ORFs of the present invention. Specifically, by using the 
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sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and 
colony/plaque hybridization, one skilled in the art can obtain homologs. 

As used herein, two nucleic acid molecules or proteins are said to "share 
significant homology" if the two contain regions which possess greater than 85% 
5 sequence (amino acid or nucleic acid) homology. Preferred homologs in this regard are 
those with more than 90% homology. Especially preferred are those with 93% or more 
homology. Among especially preferred homologs those with 95% or more homology are 
particularly preferred. Very particularly preferred among these are those with 97% and 
even more particularly preferred among those are homologs with 99% or more 

10 homology. The most preferred homologs among these are those with 99.9% homology 
or more. It will be understood that, among measures of homology, identity is particularly 
preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence provided 
in SEQ ID NOS: 1-982 or from a nucleotide sequence at least 95%, particularly at least 

15 99%, especially at least 99.5% identical to a sequence of SEQ ID NOS: 1-982 can be used 
to prime DNA synthesis and PCR amplification, as well as to identify colonies containing 
cloned DNA encoding a homolog. Methods suitable to this aspect of the present 
invention arc well known and have been described in great detail in many publications 
such as, for example, Innis et a/., PCR Protocols, Academic Press, San Diego, CA 

20 (1990)). 

When using primers derived from SEQ ID NOS: 1-982 or from a nucleotide 
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-982, one 
skilled in the art will recognize that by employing high stringency conditions (e.g., 
annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 65°C in 0.5X 

25 SSPC) only sequences which are greater than 75% homologous to the primer will be 

amplified. By employing lower stringency conditions (e.g., hybridizing at 35-37°C in 5X 
SSPC and 40-45% formamide, and washing at 42°C in 0.5X SSPC), sequences which are 
greater than 40-50% homologous to the primer will also be amplified. 

When using DNA probes derived from SEQ ID NOS: 1-982, or from a nucleotide 

30 sequence having an aforementioned identity to a sequence of SEQ ID NOS:l-982, for 
colony/plaque hybridization, one skilled in the art will recognize that by employing high 
stringency conditions (e.g., hybridizing at 50- 65°C in 5X SSPC and 50% formamide, and 
washing at 50- 65°C in 0.5X SSPC), sequences having regions which are greater than 90% 
homologous to the probe can be obtained, and that by employing lower stringency 

35 conditions (e.g., hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing 
at 42°C in 0.5X SSPC), sequences having regions which are greater than 35-45% 
homologous to the probe will be obtained. 
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Any organism can be used as the source for homologs of the present invention so 
long as the organism naturally expresses such a protein or contains genes encoding the 
same. The most preferred organism for isolating homologs are bacteria which are closely 
related to Enterococcus faecalis. 

5 

ILLUSTRATIVE USES OF COMPOSITIONS OF THE INVENTION 

Each ORF provided in Tables 1 and 2 is identified with a function by homology to 
a known gene or polypeptide. As a result, one skilled in the art can use the polypeptides 
of the present invention for commercial, therapeutic and industrial purposes consistent 

10 with the type of putative identification of the polypeptide. Such identifications permit 
one skilled in the art to use the Enterococcus faecalis ORFs in a manner similar to the 
known type of sequences for which the identification is made; for example, to ferment a 
particular sugar source or to produce a particular metabolite. A variety of reviews 
illustrative of this aspect of the invention are available, including the following reviews 

15 on the industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND 

BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY (1991) and 
BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al. y Eds., Elsevier Science 
Publishers, Amsterdam, The Netherlands (1985). A variety of exemplary uses that 
illustrate this and similar aspects of the present invention are discussed below. 

20 

1. Biosynthetic Enzymes 

Open reading frames encoding proteins involved in mediating the catalytic 
reactions involved in intermediary and macromolecular metabolism, the biosynthesis of 
small molecules, cellular processes and other functions includes enzymes involved in the 

25 degradation of the intermediary products of metabolism, enzymes involved in central 
intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, 
enzymes involved in fermentation, enzymes involved in ATP proton motor force 
conversion, enzymes involved in broad regulatory function, enzymes involved in amino 
acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor 

30 and vitamin synthesis, can be used for industrial biosynthesis. 

The various metabolic pathways present in Enterococcus faecalis can be identified 
based on absolute nutritional requirements as well as by examining the various enzymes 
identified in Table 1-3 and SEQ ID NOS: 1-982. 

Of particular interest are polypeptides involved in the degradation of 

35 intermediary metabolites as well as non-macromolccular metabolism. Such enzymes 
include amylases, glucose oxidases, and catalase. 
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Proteolytic enzymes are another class of commercially important enzymes. 
Proteolytic enzymes find use in a number of industrial processes including the processing 
of flax and other vegetable fibers, in the extraction, clarification and depectinization of 
fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and 
5 vegetables to give unicellular fruits. A detailed review of the proteolytic enzymes used in 
the food industry is provided in Rombouts et al y Symbiosis 21:19 (1986) and Voragen et 
at. in Biocatatysts In Agricultural Biotechnology, Whitaker et a/., Eds., American 
Chemical Society Symposium Series 389:93 (1989) . 

The metabolism of sugars is an important aspect of the primary metabolism of 

10 Enterococcus faecalis. Enzymes involved in the degradation of sugars, such as, 
particularly, glucose, galactose, fructose and xylose, can be used in industrial 
fermentation. Some of the important sugar transforming enzymes, from a commercial 
viewpoint, include sugar isomerases such as glucose isomerase. Other metabolic enzymes 
have found commercial use such as glucose oxidases which produces ketogulonic acid 

15 (KGA). KGA is an intermediate in the commercial production of ascorbic acid using the 
Reichstein's procedure, as described in Krueger et aL, Biotechnology 6(A) , Rhine et al. y 
Eds., Verlag Press, Weinheim, Germany (1984). 

Glucose oxidase (GOD) is commercially available and has been used in purified 
form as well as in an immobilized form for the deoxygenation of beer. See, for instance, 

20 Hartmeir et at., Biotechnology Letters 7:21 (1979). The most important application of 
GOD is the industrial scale fermentation of gluconic acid. Market for gluconic acids 
which are used in the detergent, textile, leather, photographic, pharmaceutical, food, feed 
and concrete industry, as described, for example, in Bigelis et aL, beginning on page 357 
in GENE MANIPULATIONS AND FUNGI; Benett et aL, Eds., Academic Press, New 

25 York (1985). In addition to industrial applications, GOD has found applications in 
medicine for quantitative determination of glucose in body fluids recently in 
biotechnology for analyzing syrups from starch and cellulose hydrosylates. This 
application is described in Owusu et at., Biochem. et Biophysica. Acta. 872:23 (1986), for 
instance. 

30 The main sweetener used in the world today is sugar which comes from sugar beets 

and sugar cane. In the field of industrial enzymes, the glucose isomerase process shows 
the largest expansion in the market today. Initially, soluble enzymes were used and later 
immobilized enzymes were developed (Krueger et aL, Biotechnology, The Textbook of 
Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts 

35 (1990)). Today, the use of glucose- produced high fructose syrups is by far the largest 
industrial business using immobilized enzymes. A review of the industrial use of these 
enzymes is provided by Jorgensen, Starch 40:307 (1988). 
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Proteinases, such as alkaline serine proteinases, are used as detergent additives and 
thus represent one of the largest volumes of microbial enzymes used in the industrial 
sector. Because of their industrial importance, there is a large body of published and 
unpublished information regarding the use of these enzymes in industrial processes. (See 
5 Faultman et ai. y Acid Proteases Structure Function and Biology, Tang, J., cd., Plenum 
Press, New York (1977) and Godfrey et aL, Industrial Enzymes, MacMillan Publishers, 
Surrey, UK (1983) and Hepner et al y Report Industrial Enzymes by 1990, Hel Hepner & 
Associates, London (1986)). 

Another class of commercially usable proteins of the present invention arc the 

10 microbial lipases, described by, for instance, Macrae et aL, Philosophical Transactions of 
the Chiral Society of London 310:227 (1985) and Poserke, Journal of the American Oil 
Chemist Society 61:M5S (1984). A major use of lipases is in the fat and oil industry for 
the production of neutral glycerides using lipase catalyzed inter-esterification of readily 
available triglycerides. Application of lipases include the use as a detergent additive to 

15 facilitate the removal of fats from fabrics in the course of the washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for key steps 
in the synthesis of complex organic molecules is gaining popularity at a great rate. One 
area of great interest is the preparation of chiral intermediates. Preparation of chiral 
intermediates is of interest to a wide range of synthetic chemists particularly those 

20 scientists involved with the preparation of new pharmaceuticals, agrochcmicals, 

fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral 
Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). The following 
reactions catalyzed by enzymes are of interest to organic chemists: hydrolysis of 
carboxylic acid esters, phosphate esters, amides and nitriles, csterification reactions, 

25 trans-esterification reactions, synthesis of amides, reduction of alkanones and 

oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to 
sulfoxides, and carbon bond forming reactions such as the aldol reaction. 

When considering the use of an enzyme encoded by one of the ORFs of the 
present invention for biotransformation and organic synthesis it is sometimes necessary 

30 to consider the respective advantages and disadvantages of using a microorganism as 
opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one 
hand or an isolated partially purified enzyme on the other hand, has been described in 
detail by Bud et al, Chemistry in Britain (1987), p. 127. 

Amino transferases, enzymes involved in the biosynthesis and metabolism of 

35 amino acids, are useful in the catalytic production of amino acids. The advantages of 

using microbial based enzyme systems is that the amino transferase enzymes catalyze the 
stereo- selective synthesis of only L-amino acids and generally possess uniformly high 
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catalytic rates. A description of the use of amino transferases for amino acid production 
is provided by Roselle-David, Methods of Enzymology 136:419 (1987). 

Another category of useful proteins encoded by the ORFs of the present 
invention include enzymes involved in nucleic acid synthesis, repair, and recombination. 

5 

2. Generation of Antibodies 

As described here, the proteins of the present invention, as well as homologs 
thereof, can be used in a variety of procedures and methods known in the art which are 
currently applied to other proteins. The proteins of the present invention can further be 

10 used to generate an antibody which selectively binds the protein. 

E. faecalis protein-specific antibodies for use in the present invention can be 
raised against the intact E. faecalis protein or an antigenic polypeptide fragment thereof, 
which may be presented together with a carrier protein, such as an albumin, to an animal 
system (such as rabbit or mouse) or, if it is long enough (at least about 25 amino acids), 

15 without a carrier. 

As used herein, the term "antibody" (Ab) or "monoclonal antibody" (Mab) is 
meant to include intact molecules, single chain whole antibodies, and antibody fragments. 
Antibody fragments of the present invention include Fab and F(ab')2 and other fragments 
including single-chain Fvs (scFv) and disulfide-linked Fvs (sdFv). Also included in the 

20 present invention are chimeric and humanized monoclonal antibodies and polyclonal 

antibodies specific for the polypeptides of the present invention. The antibodies of the 
present invention may be prepared by any of a variety of methods. For example, cells 
expressing a polypeptide of the present invention or an antigenic fragment thereof can 
be administered to an animal in order to induce the production of sera containing 

25 polyclonal antibodies. For example, a preparation of E. faecalis polypeptide or fragment 
thereof is prepared and purified to render it substantially free of natural contaminants. 
Such a preparation is then introduced into an animal in order to produce polyclonal 
antisera of greater specific activity. 

In a preferred method, the antibodies of the present invention are monoclonal 

30 antibodies or binding fragments thereof. Such monoclonal antibodies can be prepared 

using hybridoma technology. See, e.g., Harlow et al., ANTIBODIES: A LABORATORY 
MANUAL, (Cold Spring Harbor Laboratory Press, 2nd ed. 1988); Hammerling, et al., in: 
MONOCLONAL ANTIBODIES AND T-CELL HYBRIDOMAS 563-681 (Elsevier, 
N.Y., 1981). Fab and F(ab')2 fragments may be produced by proteolytic cleavage, using 

35 enzymes such as papain (to produce Fab fragments) or pepsin (to produce F(ab')2 
fragments). Alternatively, E. faecalis polypeptide-binding fragments, chimeric, and 
humanized antibodies can be produced through the application of recombinant DNA 
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technology or through synthetic chemistry using methods known in the art. 

Alternatively, additional antibodies capable of binding to the polypeptide antigen 
of the present invention may be produced in a two-step procedure through the use of 
anti-idiotypic antibodies. Such a method makes use of the fact that antibodies are 
5 themselves antigens, and that, therefore, it is possible to obtain an antibody which binds 
to a second antibody. In accordance with this method, E.faecalis polypeptide-specific 
antibodies are used to immunize an animal, preferably a mouse. The splenocytes of such 
an animal are then used to produce hybridoma cells, and the hybridoma cells are screened 
to identify clones which produce an antibody whose ability to bind to the E. faecalis 

10 polypeptide-specific antibody can be blocked by the E.faecalis polypeptide antigen. 

Such antibodies comprise anti-idiotypic antibodies to the E.faecalis polypeptide-specific 
antibody and can be used to immunize an animal to induce formation of further £. 
faecalis polypeptide-specific antibodies. 

Antibodies and fragements thereof of the present invention may be described by 

15 the portion of a polypeptide of the present invention recognized or specifically bound by 
the antibody. Antibody binding fragements of a polypeptide of the present invention 
may be described or specified in the same manner as for polypeptide fragements discussed 
above., i.e, by N -terminal and C- terminal positions or by size in contiguous amino acid 
residues. Any number of antibody binding fragments, of a polypeptide of the present 

20 invention, specified by N-terminal and C-terminal positions or by size in amino acid 

residues, as described above, may also be excluded from the present invention. Therefore, 
the present invention includes antibodies the specifically bind a particuarlly discribed 
fragement of a polypeptide of the present invention and allows for the exclusion of the 
same. 

25 Antibodies and fragements thereof of the present invention may also be described 

or specified in terms of their cross-reactivity. Antibodies and fragements that do not bind 
polypeptides of any other species of Enterococcus other than E. faecalis arc included in 
the present invention. Likewise, antibodies and fragements that bind only species of 
Enterococcus, i.e. antibodies and fragements that do not bind bacteria from any genus 

30 other than Enterococcus, are included in the present invention. 

3. Diagnostic and Detection Assays and Kits 

The present invention further relates to methods for assaying enterococcal 
infection in an animal by detecting the expression of genes encoding enterococcal 
35 polypeptides of the present invention. The methods comprise analyzing tissue or body 
fluid from the animal for Enterococcus -specific antibodies, nucleic acids, or proteins. 
Analysis of nucleic acid specific to Enterococcus is assayed by PCR or hybridization 
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techniques using nucleic acid sequences of the present invention as either hybridization 
probes or primers. See, e.g., Sambrook et al. Molecular cloning: A Laboratory Manual 
(Cold Spring Harbor Laboratory Press, 2nd ed., 1989, page 54 reference); Eremeeva et al. 
(1994) J. Clin. Microbiol. 32:803-810 (describing differentiation among spotted fever 
5 group Rickettsiae species by analysis of restriction fragment length polymorphism of 
PCR-amplified DNA) and Chen et al. 1994 J. Clin. Microbiol. 32:589-595 (detecting 
B. burgdorferi nucleic acids via PCR). 

Where diagnosis of a disease state related to infection with Enterococcus has 
already been made, the present invention is useful for monitoring progression or 

10 regression of the disease state whereby patients exhibiting enhanced Enterococcus gene 
expression will experience a worse clinical outcome relative to patients expressing these 
gene(s) at a lower level. 

By "biological sample" is intended any biological sample obtained from an animal, 
cell line, tissue culture, or other source which contains Enterococcus polypeptide, mRNA, 

15 or DNA. Biological samples include body fluids (such as saliva, blood, plasma, urine, 
mucus, synovial fluid, etc) tissues (such as muscle, skin, and cartilage) and any other 
biological source suspected of containing Enterococcus polypeptides or nucleic acids. 
Methods for obtaining biological samples such as tissue are well known in the art. 

The present invention is useful for detecting diseases related to Enterococcus 

20 infections in animals. Preferred animals include monkeys, apes, cats, dogs, birds, cows, 
pigs, mice, horses, rabbits and humans. Particularly preferred are humans. 

Total RNA can be isolated from a biological sample using any suitable technique 
such as the single-step guanidinium-thiocyanate-phenol-chloroform method described in 
Chomczynski et al. (1987) Anal. Biochem. 162:156-159. mRNA encoding Enterococcus 

25 polypeptides having sufficient homology to the nucleic acid sequences identified in SEQ 
ID NOS: 1-982 to allow for hybridization between complementary sequences are then 
assayed using any appropriate method. These include Northern blot analysis, SI nuclease 
mapping, the polymerase chain reaction (PCR), reverse transcription in combination 
with the polymerase chain reaction (RT-PCR), and reverse transcription in combination 

30 with the ligase chain reaction (RT-LCR). 

Northern blot analysis can be performed as described in Harada et al. (1990) Cell 
63:303-312. Briefly, total RNA is prepared from a biological sample as described above. 
For the Northern blot, the RNA is denatured in an appropriate buffer (such as 
glyoxal/dimethyl sulfoxide/sodium phosphate buffer), subjected to agarose gel 

35 electrophoresis, and transferred onto a nitrocellulose filter. After the RNAs have been 
linked to the filter by a UV linker, the filter is prehybridized in a solution containing 
formamide, SSC, Denhardt's solution, denatured salmon sperm, SDS, and sodium 
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phosphate buffer. A E, faecalis polynucleotide sequence shown in SEQ ID NOS: 1-982 
labeled according to any appropriate method (such as the 32 P-multiprimcd DNA labeling 
system (Amersham)) is used as probe. After hybridization overnight, the filter is washed 
and exposed to x-ray film. DNA for use as probe according to the present invention is 
5 described in the sections above and will preferably at least 15 nucleotides in length. 

SI mapping can be performed as described in Fujita et al. (1987) Cell 49:357-367. 
To prepare probe DNA for use in SI mapping, the sense strand of an above-described E. 
faecalis DNA sequence of the present invention is used as a template to synthesize labeled 
antisense DNA. The antisense DNA can then be digested using an appropriate restriction 

10 endonuclease to generate further DNA probes of a desired length. Such antisense probes 
are useful for visualizing protected bands corresponding to the target mRNA (i.e., mRNA 
encoding Enter ococcus polypeptides). 

Levels of mRNA encoding Enterococcus polypeptides are assayed, for e.g., using 
the RT-PCR method described in Makino et al. (1990) Technique 2:295-301. By this 

15 method, the radioactivities of the "amplicons" in the polyacrylamide gel bands are 

linearly related to the initial concentration of the target mRNA. Briefly, this method 
involves adding total RNA isolated from a biological sample in a reaction mixture 
containing a RT primer and appropriate buffer. After incubating for primer annealing, 
the mixture can be supplemented with a RT buffer, dNTPs, DTT, RNase inhibitor and 

20 reverse transcriptase. After incubation to achieve reverse transcription of the RNA, the 
RT products are then subject to PCR using labeled primers. Alternatively, rather than 
labeling the primers, a labeled dNTP can be included in the PCR reaction mixture. PCR 
amplification can be performed in a DNA thermal cycler according to conventional 
techniques. After a suitable number of rounds to achieve amplification, the PCR reaction 

25 mixture is electrophoresed on a polyacrylamide gel. After drying the gel, the 

radioactivity of the appropriate bands (corresponding to the mRNA encoding the 
Enterococcus polypeptides of the present invention) are quantified using an imaging 
analyzer. RT and PCR reaction ingredients and conditions, reagent and gel 
concentrations, and labeling methods are well known in the art. Variations on the 

30 RT-PCR method will be apparent to the skilled artisan. Other PCR methods that can 
detect the nucleic acid of the present invention can be found in PCR PRIMER: A 
LABORATORY MANUAL (C.W. Dieffenbach et al. eds., Cold Spring Harbor Lab Press, 
1995). 

The polynucleotides of the present invention, including both DNA and RNA, may 
35 be used to detect polynucleotides of the present invention or Enterococcal species 

including E. faecalis using bio chip technology. The present invention includes both high 
density chip arrays (>1000 oligonucleotides per cm 2 ) and low density chip arrays (<1000 
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oligonucleotides per cm ). Bio chips comprising arrays of polynucleotides of the present 
invention may be used to detect Enterococcal species, including E, faecalis, in biological 
and environmental samples and to diagnose an animal, including humans, with an E. 
faecalis or other Enterococcal infection. The bio chips of the present invention may 
5 comprise polynucleotide sequences of other pathogens including bacteria, viral, parasitic, 
and fungal polynucleotide sequences, in addition to the polynucleotide sequences of the 
present invention, for use in rapid diffenertial pathogenic detection and diagnosis. The 
bio chips can also be used to monitor an E. faecalis or other Enterococcal infections and 
to monitor the genetic changes (deletions, insertions, mismatches, etc.) in response to 

10 drug therapy in the clinic and drug development in the laboratory. The bio chip 

technology comprising arrays of polynucleotides of the present invention may also be 
used to simultaneously monitor the expression of a multiplicity of genes, including those 
of the present invention. The polynucleotides used to comprise a selected array may be 
specified in the same manner as for the fragements, i.e, by their 5' and 3' positions or 

15 length in contigious base pairs and include from. Methods and particular uses of the 
polynucleotides of the present invention to detect Enterococcal species, including E, 
faecalis, using bio chip technology include those known in the art and those of: U.S. 
Patent Nos. 5510270, 5545531, 5445934, 5677195, 5532128, 5556752, 5527681, 
5451683, 5424186, 5607646, 5658732 and World Patent Nos. WO/9710365, 

20 WO/9511995, WO/9743447, WO/9535505, each incorporated herein in their entireties. 

Biosensors using the polynucleotides of the present invention may also be used to 
detect, diagnose, and monitor E. faecalis or other Enterococcal species and infections 
thereof. Biosensors using the polynucleotides of the present invention may also be used 
to detect particular polynucleotides of the present invention. Biosensors using the 

25 polynucleotides of the present invention may also be used to monitor the genetic changes 
(deletions, insertions, mismatches, etc.) in response to drug therapy in the clinic and drug 
development in the laboratory. Methods and particular uses of the polynucleotides of the 
present invention to detect Enterococcal species, including E. faecalis, using biosenors 
include those known in the art and those of: U.S. Patent Nos 5721102, 5658732, 

30 5631170, and World Patent Nos. WO97/3501 1, WO/9720203, each incorporated herein 
in their entireties. 

Thus, the present invention includes both bio chips and biosensors comprising 
polynucleotides of the present invention and methods of their use. 

Assaying Enterococcus polypeptide levels in a biological sample can occur using 
35 any art-known method, such as antibody-based techniques. For example, Enterococcus 
polypeptide expression in tissues can be studied with classical immunohistological 
methods. In these, the specific recognition is provided by the primary antibody 
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(polyclonal or monoclonal) but the secondary detection system can utilize fluorescent, 
enzyme, or other conjugated secondary antibodies. As a result, an immunohisto logical 
staining of tissue section for pathological examination is obtained. Tissues can also be 
extracted, e.g., with urea and neutral detergent, for the liberation of Enterococcus 
5 polypeptides for Western-blot or dot/slot assay. See, e.g., Jalkanen, M. et al. (1985) J. 
Cell. Biol. 101:976-985; Jalkanen, M. et al. (1987) J. Cell . Biol. 105:3087-3096. In this 
technique, which is based on the use of cationic solid phases, quantitation of a 
Enterococcus polypeptide can be accomplished using an isolated Enterococcus 
polypeptide as a standard. This technique can also be applied to body fluids. 

10 Other antibody-based methods useful for detecting Enterococcus polypeptide gene 

expression include immunoassays, such as the ELISA and the radioimmunoassay (R1A). 
For example, a Enterococcus polypep tide-specific monoclonal antibodies can be used 
both as an immunoabsorbent and as an enzyme-labeled probe to detect and quantify a 
Enterococcus polypeptide. The amount of a Enterococcus polypeptide present in the 

15 sample can be calculated by reference to the amount present in a standard preparation 
using a linear regression computer algorithm. Such an ELISA is described in Iacobelli et 
al. (1988) Breast Cancer Research and Treatment 11:19-30. In another ELISA assay, two 
distinct specific monoclonal antibodies can be used to detect Enterococcus polypeptides 
in a body fluid. In this assay, one of the antibodies is used as the immunoabsorbent and 

20 the other as the enzyme-labeled probe. 

The above techniques may be conducted essentially as a "one-step" or "two-step" 
assay. The "one-step" assay involves contacting the Enterococcus polypeptide with 
immobilized antibody and, without washing, contacting the mixture with the labeled 
antibody. The "two-step" assay involves washing before contacting the mixture with the 

25 labeled antibody. Other conventional methods may also be employed as suitable. It is 

usually desirable to immobilize one component of the assay system on a support, thereby 
allowing other components of the system to be brought into contact with the component 
and readily removed from the sample. Variations of the above and other immunological 
methods included in the present invention can also be found in Harlow et al., 

30 ANTIBODIES: A LABORATORY MANUAL, (Cold Spring Harbor Laboratory Press, 2nd 
ed. 1988). 

Suitable enzyme labels include, for example, those from the oxidase group, which 
catalyze the production of hydrogen peroxide by reacting with substrate. Glucose oxidase 
is particularly preferred as it has good stability and its substrate (glucose) is readily 
35 available. Activity of an oxidase label may be assayed by measuring the concentration of 
hydrogen peroxide formed by the enzyme-labeled antibody/substrate reaction. Besides 
enzymes, other suitable labels include radioisotopes, such as iodine ( l25 I, l21 I), carbon 
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( 14 C), sulphur ( 35 S), tritium ( 3 H), indium ( n2 In), and technetium (" m Tc), and fluorescent 
labels, such as fluorescein and rhodamine, and biotin. 

Further suitable labels for the Enterococcus polypeptide-specific antibodies of the 
present invention are provided below. Examples of suitable enzyme labels include malate 
5 dehydrogenase, Enterococcal nuclease, delta-5 -steroid isomerase, yeast-alcohol 

dehydrogenase, alpha-glycerol phosphate dehydrogenase, triose phosphate isomerase, 
peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, 
ribonuclease, urease, catalase, glucose-6-phosphate dehydrogenase, glucoamylase, and 
acetylcholine esterase. 

10 Examples of suitable radioisotopic labels include 3 H, IH In, I25 I, 13I I, 32 P, 35 S, U C, 

51 Cr, 57 To, 58 Co, 59 Fe, 75 Se, ,52 Eu, 90 Y, 67 Cu, 217 Ci, 211 At, 2l2 Pb, 47 Sc, 109 Pd, etc. lu In is a 
preferred isotope where in vivo imaging is used since its avoids the problem of 
dehalogenation of the 125 I or 131 I-labeled monoclonal antibody by the liver. In addition, 
this radionucleotide has a more favorable gamma emission energy for imaging.. See, e.g., 

15 Perkins et al. (1985) Eur. J. Nucl. Med. 10:296-301; Carasquillo et al. (1987) J. Nucl. 
Med. 28:281-287. For example, IU In coupled to monoclonal antibodies with 
l-(P-isothiocyanatobenzyl)-DPTA has shown little uptake in non-tumors tissues, 
particularly the liver, and therefore enhances specificity of tumor localization. See, 
Esteban et al. (1987) J. Nucl Med. 28:861-870. 

20 Examples of suitable non-radioactive isotopic labels include 157 Gd, 55 Mn, 162 Dy, 

52 Tr, and 56 Fe. 

Examples of suitable fluorescent labels include an 152 Eu label, a fluorescein label, 
an isothiocyanate label, a rhodamine label, a phycoerythrin label, a phycocyanin label, an 
allophycocyanin label, an o-phthaldehyde label, and a fluorescamine label. 

25 Examples of suitable toxin labels include, Pseudomonas toxin, diphtheria toxin, 

ricin, and cholera toxin. 

Examples of chemiiuminescent labels include a luminal label, an isoluminal label, 
an aromatic acridinium ester label, an imidazole label, an acridinium salt label, an oxalate 
ester label, a luciferin label, a luciferase label, and an aequorin label. 

30 Examples of nuclear magnetic resonance contrasting agents include heavy metal 

nuclei such as Gd, Mn, and iron. 

Typical techniques for binding the above-described labels to antibodies are 
provided by Kennedy et al. (1976) Clin. Chim. Acta 70:1-31, and Schurs et al. (1977) 
Clin. Chim. Acta 81:1-40. Coupling techniques mentioned in the latter are the 

35 glutaraldehyde method, the periodate method, the dimaleimide method, the 

m-maleimidobenzyl-N-hydroxy-succinimide ester method, all of which methods are 
incorporated by reference herein. 
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In a related aspect, the invention includes a diagnostic kit for use in screening 
serum containing antibodies specific against E. faecalis infection. Such a kit may include 
an isolated £. faecalis antigen comprising an epitope which is specifically 
immunoreactive with at least one anti-is. faecalis antibody. Such a kit also includes 
5 means for detecting the binding of said antibody to the antigen. In specific embodiments, 
the kit may include a recombinantly produced or chemically synthesized peptide or 
polypeptide antigen. The peptide or polypeptide antigen may be attached to a solid 
support. 

In a more specific embodiment, the detecting means of the above-described kit 

10 includes a solid support to which said peptide or polypeptide antigen is attached. Such a 
kit may also include a non-attached reporter-labeled anti-human antibody. In this 
embodiment, binding of the antibody to the E. faecalis antigen can be detected by binding 
of the reporter labeled antibody to the anti-Zs, faecalis polypeptide antibody. 

In a related aspect, the invention includes a method of detecting E. faecalis 

15 infection in a subject. This detection method includes reacting a body fluid, preferably 
serum, from the subject with an isolated E. faecalis antigen, and examining the antigen 
for the presence of bound antibody. In a specific embodiment, the method includes a 
polypeptide antigen attached to a solid support, and serum is reacted with the support. 
Subsequently, the support is reacted with a reporter- labeled anti-human antibody. The 

20 support is then examined for the presence of reporter-labeled antibody. 

The solid surface reagent employed in the above assays and kits is prepared by 
known techniques for attaching protein material to solid support material, such as 
polymeric beads, dip sticks, 96-well plates or filter material. These attachment methods 
generally include non-specific adsorption of the protein to the support or covalent 

25 attachment of the protein , typically through a free amine group, to a chemically 

reactive group on the solid support, such as an activated carboxyl, hydroxyl, or aldehyde 
group. Alternatively, strep tavi din coated plates can be used in conjunction with 
biotinylated antigen(s). 

The polypeptides and antibodies of the present invention, including fragments 

30 thereof, may be used to detect Enterococcal species including E. faecalis using bio chip 
and biosensor technology. Bio chip and biosensors of the present invention may 
comprise the polypeptides of the present invention to detect antibodies, which 
specifically recognize Enterococcal species, including E. faecalis. Bio chip and biosensors 
of the present invention may also comprise antibodies which specifically recognize the 

35 polypeptides of the present invention to detect Enterococcal species, including E. 
faecalis or specific polypeptides of the present invention. Bio chips or biosensors 
comprising polypeptides or antibodies of the present invention may be used to detect 
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Enterococcal species, including E. faecalis, in biological and environmental samples and 
to diagnose an animal, including humans, with an E. faecalis or other Enterococcal 
infection. Thus, the present invention includes both bio chips and biosensors comprising 
polypeptides or antibodies of the present invention and methods of their use. 
5 The bio chips of the present invention may further comprise polypeptide 

sequences of other pathogens including bacteria, viral, parasitic, and fungal polypeptide 
sequences, in addition to the polypeptide sequences of the present invention, for use in 
rapid diffenertial pathogenic detection and diagnosis. The bio chips of the present 
invention may further comprise antibodies or fragements thereof specific for other 

10 pathogens including bacteria, viral, parasitic, and fungal polypeptide sequences, in addition 
to the antibodies or fragements thereof of the present invention, for use in rapid 
diffenertial pathogenic detection and diagnosis. The bio chips and biosensors of the 
present invention may also be used to monitor an E. faecalis or other Enterococcal 
infection and to monitor the genetic changes (amio acid deletions, insertions, 

15 substitutions, etc.) in response to drug therapy in the clinic and drug development in the 
laboratory. The bio chip and biosensors comprising polypeptides or antibodies of the 
present invention may also be used to simultaneously monitor the expression of a 
multiplicity of polypeptides, including those of the present invention. The polypeptides 
used to comprise a bio chip or biosensor of the present invention may be specified in the 

20 same manner as for the fragements, i.e, by their N- terminal and C-terminal positions or 
length in contigious amino acid residue. Methods and particular uses of the polypeptides 
and antibodies of the present invention to detect Enterococcal species, including E. 
faecalis, or specific polypeptides using bio chip and biosensor technology include those 
known in the art, those of the U.S. Patent Nos. and World Patent Nos. listed above for 

25 bio chips and biosensors using polynucleotides of the present invention, and those of: 
U.S. Patent Nos. 5658732, 5135852, 5567301, 5677196, 5690894 and World Patent 
Nos. W09729366, W09612957, each incorporated herein in their entireties. 

4. Screening Assay for Binding Agents 

30 Using the isolated proteins of the present invention, the present invention 

further provides methods of obtaining and identifying agents which bind to a protein 
encoded by one of the ORFs of the present invention or to one of the fragments and the 
Enterococcus faecalis fragment and contigs herein described. 
In general, such methods comprise steps of: 

35 (a) contacting an agent with an isolated protein encoded by one of the ORFs of 

the present invention, or an isolated fragment of the Enterococcus faecalis genome; and 
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(b) determining whether the agent binds to said protein or said fragment. 

The agents screened in the above assay can be, but are not limited to, peptides, 
carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be 
selected and screened at random or rationally selected or designed using protein modeling 
5 techniques. 

For random screening, agents such as peptides, carbohydrates, pharmaceutical 
agents and the like are selected at random and are assayed for their ability to bind to the 
protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used herein, an 

10 agent is said to be "rationally selected or designed" when the agent is chosen based on the 
configuration of the particular protein. For example, one skilled in the art can readily 
adapt currently available procedures to generate peptides, pharmaceutical agents and the 
like capable of binding to a specific peptide sequence in order to generate rationally 
designed antipeptide peptides, for example see Hurby et al., "Application of Synthetic 

15 Peptides: Antisense Peptides," in Synthetic Peptides, A User's Guide, W. H. Freeman, NY 
(1992), pp. 289-307, and Kaspczak et al. y Biochemistry 25:9230-8 (1989), or 
pharmaceutical agents, or the like. 

In addition to the foregoing, one class of agents of the present invention, as 
broadly described, can be used to control gene expression through binding to one of the 

20 ORFs or EMFs of the present invention. As described above, such agents can be 

randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a 
skilled artisan to design sequence specific or element specific agents, modulating the 
expression of either a single ORF or multiple ORFs which rely on the same EMF for 
expression control. 

25 One class of DNA binding agents are agents which contain base residues which 

hybridize or form a triple helix by binding to DNA or RNA. Such agents can be based on 
the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or 
polymeric derivatives which have base attachment capacity. 

Agents suitable for use in these methods usually contain 20 to 40 bases and are 

30 designed to be complementary to a region of the gene involved in transcription (triple 
helix - see Lee et al. 9 NucL Acids Res. 6:3073 (1979); Cooney et a/., Science 241:456 
(1988); and Dervan et al> Science 257:1360 (1991)) or to the mRNA itself (antisense - 
Okano, J, Neurochem. 55:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of 
Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix- formation optimally 

35 results in a shut- off of RNA transcription from DNA, while antisense RNA hybridization 
blocks translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the sequences 
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of the present invention can be used to design antisense and triple helix-forming 
oligonucleotides, and other DNA binding agents. 

5. Pharmaceutical Compositions and Vaccines 

5 The present invention further provides pharmaceutical agents which can be used 

to modulate the growth or pathogenicity of Enterococcus faecalis, or another related 
organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is defined as a 
composition of matter which can be formulated using known techniques to provide a 
pharmaceutical compositions. As used herein, the "pharmaceutical agents of the present 
10 invention" refers the pharmaceutical agents which are derived from the proteins encoded 
by the ORFs of the present invention or are agents which are identified using the herein 
described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth and/or 
pathogenicity of Enterococcus faecalis or a related organism, in vivo or in vitro" when 

15 the agent reduces the rate of growth, rate of division, or viability of the organism in 

question. The pharmaceutical agents of the present invention can modulate the growth 
or pathogenicity of an organism in many fashions, although an understanding of the 
underlying mechanism of action is not needed to practice the use of the pharmaceutical 
agents of the present invention. Some agents will modulate the growth by binding to an 

20 important protein thus blocking the biological activity of the protein, while other agents 
may bind to a component of the outer surface of the organism blocking attachment or 
rendering the organism more prone to act the bodies nature immune system. 
Alternatively, the agent may comprise a protein encoded by one of the ORFs of the 
present invention and serve as a vaccine. The development and use of a vaccine based on 

25 outer membrane components are well known in the art. 

As used herein, a "related organism" is a broad term which refers to any organism 
whose growth can be modulated by one of the pharmaceutical agents of the present 
invention. In general, such an organism will contain a homo log of the protein which is 
the target of the pharmaceutical agent or the protein used as a vaccine. As such, related 

30 organisms do not need to be bacterial but may be fungal or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may be 
administered in a convenient manner, such as by the oral, topical, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 
pharmaceutical compositions are administered in an amount which is effective for 

35 treating and/or prophylaxis of the specific indication. In general, they are administered 
in an amount of at least about 1 mg/kg body weight and in most cases they will be 
administered in an amount not in excess of about 1 g/kg body weight per day. In most 
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cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into 
account the routes of administration, symptoms, etc. 

The agents of the present invention can be used in native form or can be modified 
to form a chemical derivative. As used herein, a molecule is said to be a "chemical 
5 derivative" of another molecule when it contains additional chemical moieties not 

normally a part of the molecule. Such moieties may improve the molecule's solubility, 
absorption, biological half life, etc. The moieties may alternatively decrease the toxicity 
of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc. 
Moieties capable of mediating such effects are disclosed in, among other sources, 

10 REMINGTON'S PHARMACEUTICAL SCIENCES (1 980) cited elsewhere herein. 

For example, such moieties may change an immunological character of the 
functional derivative, such as affinity for a given antibody. Such changes in 
immunomodulation activity are measured by the appropriate assay, such as a competitive 
type immunoassay. Modifications of such protein properties as redox or thermal 

15 stability, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or 
the tendency to aggregate with carriers or into multimers also may be effected in this way 
and can be assayed by methods well known to the skilled artisan. 

The therapeutic effects of the agents of the present invention may be obtained by 
providing the agent to a patient by any suitable means (e.g., inhalation, intravenously, 

20 intramuscularly, subcutaneously, enterally, or parcnterally). It is preferred to administer 
the agent of the present invention so as to achieve an effective concentration within the 
blood or tissue in which the growth of the organism is to be controlled. To achieve an 
effective blood concentration, the preferred method is to administer the agent by 
injection. The administration may be by continuous infusion, or by single or multiple 

25 injections. 

In providing a patient with one of the agents of the present invention, the dosage 
of the administered agent will vary depending upon such factors as the patient's age, 
weight, height, sex, general medical condition, previous medical history, etc. In general, 
it is desirable to provide the recipient with a dosage of agent which is in the range of from 

30 about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage 
may be administered. The therapeutically effective dose can be lowered by using 
combinations of the agents of the present invention or another agent. 

As used herein, two or more compounds or agents are said to be administered "in 
combination" with each other when either (1) the physiological effects of each 

35 compound, or (2) the serum concentrations of each compound can be measured at the 
same time. The composition of the present invention can be administered concurrently 
with, prior to, or following the administration of the other agent. 
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The agents of the present invention are intended to be provided to recipient 
subjects in an amount sufficient to decrease the rate of growth (as defined above) of the 
target organism. 

The administration of the agent(s) of the invention may be for either a 
5 "prophylactic" or "therapeutic" purpose. When provided prophylactically, the agent(s) 
are provided in advance of any symptoms indicative of the organisms growth. The 
prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the 
rate of onset of any subsequent infection. When provided therapeutically, the agent(s) 
are provided at (or shortly after) the onset of an indication of infection. The therapeutic 

10 administration of the compound(s) serves to attenuate the pathological symptoms of the 
infection and to increase the rate of recovery. 

The agents of the present invention are administered to a subject, such as a 
mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically 
effective concentration. A composition is said to be "pharmacologically acceptable" if 

15 its administration can be tolerated by a recipient patient. Such an agent is said to be 
administered in a "therapeutically effective amount" if the amount administered is 
physiologically significant. An agent is physiologically significant if its presence results 
in a detectable change in the physiology of a recipient patient. 

The agents of the present invention can be formulated according to known 

20 methods to prepare pharmaceutically useful compositions, whereby these materials, or 

their functional derivatives, are combined in a mixture with a pharmaceutically acceptable 
carrier vehicle. Suitable vehicles and their formulation, inclusive of other human 
proteins, eg., human serum albumin, are described, for example, in REMINGTON'S 
PHARMACEUTICAL SCIENCES, 16th Ed., Osol, A., Ed., Mack Publishing, Easton PA 

25 (1980). In order to form a pharmaceutically acceptable composition suitable for 

effective administration, such compositions will contain an effective amount of one or 
more of the agents of the present invention, together with a suitable amount of carrier 
vehicle. 

Additional pharmaceutical methods may be employed to control the duration of 
30 action. Control release preparations may be achieved through the use of polymers to 
complex or absorb one or more of the agents of the present invention. The controlled 
delivery may be effectuated by a variety of well known techniques, including formulation 
with macromolecules such as, for example, polyesters, polyamino acids, polyvinyl, 
pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or 
35 protamine, sulfate, adjusting the concentration of the macromolecules and the agent in 
the formulation, and by appropriate use of methods of incorporation, which can be 
manipulated to effectuate a desired time course of release. Another possible method to 
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control the duration of action by controlled release preparations is to incorporate agents 
of the present invention into particles of a polymeric material such as polyesters, 
polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. 
Alternatively, instead of incorporating these agents into polymeric particles, it is possible 
5 to entrap these materials in microcapsules prepared, for example, by coacervation 

techniques or by interfacial polymerization with, for example, hydroxymcthylccllulose or 
gelatine-microcapsules and poly(methylmethacylate) microcapsules, respectively, or in 
colloidal drug delivery systems, for example, liposomes, albumin microspheres, 
microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques 

10 are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES (1 980). 

The invention further provides a pharmaceutical pack or kit comprising one or 
more containers filled with one or more of the ingredients of the pharmaceutical 
compositions of the invention. Associated with such containcr(s) can be a notice in the 
form prescribed by a governmental agency regulating the manufacture, use or sale of 

15 pharmaceuticals or biological products, which notice reflects approval by the agency of 
manufacture, use or sale for human administration. 

In addition, the agents of the present invention may be employed in conjunction 
with other therapeutic compounds. 

The present invention also provides vaccines comprising one or more 

20 polypeptides of the present invention. Heterogeneity in the composition of a vaccine 
may be provided by combining E. faecalis polypeptides of the present invention. 
Multi-component vaccines of this type are desirable because they are likely to be more 
effective in eliciting protective immune responses against multiple species and strains of 
the Enterococcus genus than single polypeptide vaccines. 

25 Multi-component vaccines arc known in the art to elicit antibody production to 

numerous immunogenic components. See, e.g., Decker et al. (1996) J. Infect. Dis. 
174:S270-275. In addition, a hepatitis B, diphtheria, tetanus, pertussis tetravalent 
vaccine has recently been demonstrated to elicit protective levels of antibodies in human 
infants against all four pathogenic agents. See, e.g., Aristegui, J. et al. (1997) Vaccine 

30 15:7-9. 

The present invention in addition to single-component vaccines includes 
multi-component vaccines. These vaccines comprise more than one polypeptide, 
immunogen or antigen. Thus, a multi-component vaccine would be a vaccine comprising 
more than one of the E. faecalis polypeptides of the present invention. 
35 Further within the scope of the invention are whole cell and whole viral vaccines. 

Such vaccines may be produced recombinantly and involve the expression of one or more 
of the E. faecalis polypeptides described in SEQ ID NOS: 1-982. For example, the E. 



WO 98/50555 



PCTAJS98/08985 



47 

faecalis polypeptides of the present invention may be either secreted or localized 
intracellular, on the cell surface, or in the periplasmic space. Further, when a 
recombinant virus is used, the E. faecalis polypeptides of the present invention may, for 
example, be localized in the viral envelope, on the surface of the capsid, or internally 
5 within the capsid. Whole cells vaccines which employ cells expressing heterologous 
proteins are known in the art. See, e.g., Robinson, K. et al. (1997) Nature Biotech. 
15:653-657; Sirard, J. et al. (1997) Infect. Immun, 65:2029-2033; Chabalgoity, J. et al. 
(1997) Infect. Immun. 65:2402-2412 . These cells may be administered live or may be 
killed prior to administration. Chabalgoity, J. et al., supra, for example, report the 
10 successful use in mice of a live attenuated Salmonella vaccine strain which expresses a 
portion of a platyhelminth fatty acid-binding protein as a fusion protein on its cells 
surface. 

A multi-component vaccine can also be prepared using techniques known in the 
art by combining one or more E. faecalis polypeptides of the present invention, or 

15 fragments thereof, with additional non-Enterococcal components (e.g., diphtheria toxin 
or tetanus toxin, and/or other compounds known to elicit an immune response). Such 
vaccines are useful for eliciting protective immune responses to both members of the 
Enterococcus genus and non-Enterococcal pathogenic agents. 

The vaccines of the present invention also include DNA vaccines. DNA vaccines 

20 are currently being developed for a number of infectious diseases. See, et al, Boyer, et al. 
(1997) Nat. Med. 3:526-532; reviewed in Spier, R. (1996) Vaccine 14:1285-1288. Such 
DNA vaccines contain a nucleotide sequence encoding one or more E. faecalis 
polypeptides of the present invention oriented in a manner that allows for expression of 
the subject polypeptide. For example, the direct administration of plasmid DNA 

25 encoding B. burgdorgeri OspA has been shown to elicit protective immunity in mice 
against borrelial challenge. See, Luke et al. (1997) J. Infect. Dis. 175:91-97. 

The present invention also relates to the administration of a vaccine which is 
co-administered with a molecule capable of modulating immune responses. Kim ct al. 
(1997) Nature Biotech. 15:641-646, for example, report the enhancement of immune 

30 responses produced by DNA immunizations when DNA sequences encoding molecules 
which stimulate the immune response are co-administered. In a similar fashion, the 
vaccines of the present invention may be co-administered with either nucleic acids 
encoding immune modulators or the immune modulators themselves. These immune 
modulators include granulocyte macrophage colony stimulating factor (GM-CSF) and 

35 CD86. 

The vaccines of the present invention may be used to confer resistance to 
Enterococcal infection by either passive or active immunization. When the vaccines of 
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the present invention are used to confer resistance to Enterococcal infection through 
active immunization, a vaccine of the present invention is administered to an animal to 
elicit a protective immune response which cither prevents or attenuates a Enterococcal 
infection. When the vaccines of the present invention are used to confer resistance to 
5 Enterococcal infection through passive immunization, the vaccine is provided to a host 
animal (e.g., human, dog, or mouse), and the antisera elicited by this antisera is recovered 
and directly provided to a recipient suspected of having an infection caused by a member 
of the Enterococcus genus. 

The ability to label antibodies, or fragments of antibodies, with toxin molecules 

10 provides an additional method for treating Enterococcal infections when passive 

immunization is conducted. In this embodiment, antibodies, or fragments of antibodies, 
capable of recognizing the E. faecalis polypeptides disclosed herein, or fragments thereof, 
as well as other Enterococcus proteins, are labeled with toxin molecules prior to their 
administration to the patient. When such toxin derivatized antibodies bind to 

15 Enterococcus cells, toxin moieties will be localized to these cells and will cause their 
death. 

The present invention thus concerns and provides a means for preventing or 
attenuating a Enterococcal infection resulting from organisms which have antigens that 
are recognized and bound by antisera produced in response to the polypeptides of the 

20 present invention. As used herein, a vaccine is said to prevent or attenuate a disease if its 
administration to an animal results either in the total or partial attenuation (i.e., 
suppression) of a symptom or condition of the disease, or in the total or partial immunity 
of the animal to the disease. 

The administration of the vaccine (or the antisera which it elicits) may be for 

25 either a "prophylactic" or "therapeutic" purpose. When provided prophylactically, the 
compound(s) are provided in advance of any symptoms of Enterococcal infection. The 
prophylactic administration of the compound(s) serves to prevent or attenuate any 
subsequent infection. When provided therapeutically, the compound(s) is provided upon 
or after the detection of symptoms which indicate that an animal may be infected with a 

30 member of the Enterococcus genus. The therapeutic administration of the compound(s) 
serves to attenuate any actual infection. Thus, the E. faecalis polypeptides, and 
fragments thereof, of the present invention may be provided either prior to the onset of 
infection (so as to prevent or attenuate an anticipated infection) or after the initiation of 
an actual infection. 

35 The polypeptides of the invention, whether encoding a portion of a native 

protein or a functional derivative thereof, may be administered in pure form or may be 
coupled to a macromolecular carrier. Example of such carriers are proteins and 
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carbohydrates. Suitable proteins which may act as macromolecular carrier for enhancing 
the immunogenicity of the polypeptides of the present invention include keyhole limpet 
hemacyanin (KLH) tetanus toxoid, pertussis toxin, bovine serum albumin, and ovalbumin. 
Methods for coupling the polypeptides of the present invention to such macromolecular 
5 carriers are disclosed in Harlow et al., ANTIBODIES: A LABORATORY MANUAL, 
(Cold Spring Harbor Laboratory Press, 2nd ed. 1988). 

A composition is said to be "pharmacologically or physiologically acceptable" if 
its administration can be tolerated by a recipient animal and is otherwise suitable for 
administration to that animal. Such an agent is said to be administered in a 

10 "therapeutically effective amount" if the amount administered is physiologically 

significant. An agent is physiologically significant if its presence results in a detectable 
change in the physiology of a recipient patient. 

While in all instances the vaccine of the present invention is administered as a 
pharmacologically acceptable compound, one skilled in the art would recognize that the 

15 composition of a pharmacologically acceptable compound varies with the animal to 
which it is administered. For example, a vaccine intended for human use will generally 
not be co-administered with Frcund's adjuvant. Further, the level of purity of the E. 
faecalis polypeptides of the present invention will normally be higher when administered 
to a human than when administered to a non-human animal. 

20 As would be understood by one of ordinary skill in the art, when the vaccine of 

the present invention is provided to an animal, it may be in a composition which may 
contain salts, buffers, adjuvants, or other substances which are desirable for improving the 
efficacy of the composition. Adjuvants are substances that can be used to specifically 
augment a specific immune response. These substances generally perform two functions; 

25 (1) they protect the antigcn(s) from being rapidly catabolized after administration and (2) 
they nonspecifically stimulate immune responses. 

Normally, the adjuvant and the composition are mixed prior to presentation to 
the immune system, or presented separately, but into the same site of the animal being 
immunized. Adjuvants can be loosely divided into several groups based upon their 

30 composition. These groups include oil adjuvants (for example, Freund's complete and 
incomplete), mineral salts (for example, A1K(S0 4 ) 2 , AlNa(S0 4 ) 2 , A1NH 4 (S0 4 ), silica, 
kaolin, and carbon), polynucleotides (for example, poly IC and poly AU acids), and 
certain natural substances (for example, wax D from Mycobacterium tuberculosis, as well 
as substances found in Corynebacterium parvum, or Bordetella pertussis, and members of 

35 the genus Brucella. Other substances useful as adjuvants are the saponins such as, for 
example, Quil A. (Superfos A/S, Denmark). Preferred adjuvants for use in the present 
invention include aluminum salts, such as A1K(S0 4 ) 2 , AlNa(S0 4 ) 2 , and AlNH 4 (S0 4 ). 
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Examples of materials suitable for use in vaccine compositions are provided in 
REMINGTON'S PHARMACEUTICAL SCIENCES 1324-1341 (A. Osol, ed, Mack 
Publishing Co, Easton, PA, (1980) (incorporated herein by reference). 

The therapeutic compositions of the present invention can be administered 
5 parenterally by injection, rapid infusion, nasopharyngeal absorption 

(intranasopharangeally), dermoabsorption, or orally. The compositions may 
alternatively be administered intramuscularly, or intravenously. Compositions for 
parenteral administration include sterile aqueous or non-aqueous solutions, suspensions, 
and emulsions. Examples of non-aqueous solvents arc propylene glycol, polyethylene 

10 glycol, vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. 
Carriers or occlusive dressings can be used to increase skin permeability and enhance 
antigen absorption. Liquid dosage forms for oral administration may generally comprise a 
liposome solution containing the liquid dosage form. Suitable forms for suspending 
liposomes include emulsions, suspensions, solutions, syrups, and elixirs containing inert 

15 diluents commonly used in the art, such as purified water. Besides the inert diluents, such 
compositions can also include adjuvants, wetting agents, emulsifying and suspending 
agents, or sweetening, flavoring, or perfuming agents. 

Therapeutic compositions of the present invention can also be administered in 
encapsulated form. For example, intranasal immunization using vaccines encapsulated in 

20 biodegradable microsphere composed of poly(DL-lactide-co-glycolide). See, Shahin, R. et 
al. (1995) Infect. Immun. 63:1195-1200. Similarly, orally administered encapsulated 
Salmonella typhimurium antigens can also be used. Allaoui-Attarki, K. et al. (1997) 
Infect. Immun. 65:853-857. Encapsulated vaccines of the present invention can be 
administered by a variety of routes including those involving contacting the vaccine with 

25 mucous membranes (e.g., intranasally, intracolonicly, intraduodenally). 

Many different techniques exist for the timing of the immunizations when a 
multiple administration regimen is utilized. It is possible to use the compositions of the 
invention more than once to increase the levels and diversities of expression of the 
immunoglobulin repertoire expressed by the immunized animal. Typically, if multiple 

30 immunizations are given, they will be given one to two months apart. 

According to the present invention, an "effective amount" of a therapeutic 
composition is one which is sufficient to achieve a desired biological effect. Generally, 
the dosage needed to provide an effective amount of the composition will vary depending 
upon such factors as the animal's or human's age, condition, sex, and extent of disease, if 

35 any, and other variables which can be adjusted by one of ordinary skill in the art. 

The antigenic preparations of the invention can be administered by either single 
or multiple dosages of an effective amount. Effective amounts of the compositions of 
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the invention can vary from 0.01-1,000 |ig/ml per dose, more preferably 0.1-500 M-g/ml 
per dose, and most preferably 10-300 |ig/ml per dose. 

6. Shot-Gun Approach to Megabase DNA Sequencing 

5 The present invention further demonstrates that a large genome can be sequenced 

using a random shotgun approach. This procedure, described in detail in the examples 
that follow, has eliminated the up front cost of isolating and ordering overlapping or 
contiguous subclones prior to the start of the sequencing protocols. 

Certain aspects of the present invention are described in greater detail in the 
10 examples that follow. The examples are provided by way of illustration. Other aspects 
and embodiments of the present invention are contemplated by the inventors, as will be 
clear to those of skill in the art from reading the present disclosure. 



ILLUSTRATIVE EXAMPLES 

15 

LIBRARIES AND SEQUENCING 

1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing follows 
from the Lander and Waterman (Landerman and Waterman, Genomics 2;231 (1988)) 

20 application of the equation for the Poisson distribution. According to this treatment, the 
probability, P0, that any given base in a sequence of size L, in nucleotides, is not 
sequenced after a certain amount, n, in nucleotides, of random sequence has been 
determined can be calculated by the equation P0 = e-m, where m is L/n, the fold 
coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 Mb of sequence has been 

25 randomly generated (IX coverage). At that point, P0 = e-1 = 0.37. The probability that 
any given base has not been sequenced is the same as the probability that any region of 
the whole sequence L has not been determined and, therefore, is equivalent to the fraction 
of the whole sequence that has yet to be determined. Thus, at one-fold coverage, 
approximately 37% of a polynucleotide of size L, in nucleotides has not been sequenced. 

30 When 14 Mb of sequence has been generated, coverage is 5X for a 2.8 Mb and the 

unsequenced fraction drops to .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be 
attained by sequencing approximately 17,000 random clones from both insert ends with 
an average sequence read length of 410 bp. 

Similarly, the total gap length, G, is determined by the equation G = Le-m, and the 

35 average gap size, g, follows the equation, g = L/n. Thus, 5X coverage leaves about 240 
gaps averaging about 82 bp in size in a sequence of a polynucleotide 2.8 Mb long. 
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The treatment above is essentially that of Lander and Waterman, Genomics 2\ 
231 (1988). 

2. Random Library Construction 

5 In order to approximate the random model described above during actual 

sequencing, a nearly ideal library of cloned genomic fragments is required. The following 
library construction procedure was developed to achieve this end. 

Enterococcus faecalis DNA is prepared by phenol extraction. A mixture 
containing 200 jig DNA in 1.0 ml of 300 mM sodium acetate, 10 mM Tris-HCl, 1 mM 

10 Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical Products) with a 
stream of nitrogen adjusted to 35 Kpa for 2 minutes. The sonicated DNA is ethanol 
precipitated and redissolved in 500 \i\ TE buffer. 

To create blunt-ends, a 1 00 |xl aliquot of the resuspended DNA is digested with 5 
units of BAL31 nuclease (New England BioLabs) for 10 min at 30°C in 200 \i\ BAL31 

15 buffer. The digested DNA is phenol -extracted, ethanol-precipitated, redissolved in 100 |il 
TE buffer, and then size- fractionated by electrophoresis through a 1 .0% low melting 
temperature agarose gel. The section containing DNA fragments 1 .6-2.0 kb in size is 
excised from the gel, and the LGT agarose is melted and the resulting solution is extracted 
with phenol to separate the agarose from the DNA. DNA is ethanol precipitated and 

20 redissolved in 20 |xl of TE buffer for ligation to vector. 

A two-step ligation procedure is used to produce a plasmid library with 97% 
inserts, of which >99% were single inserts. The first ligation mixture (50 ul) contains 2 
jig of DNA fragments, 2 p.g pUC18 DNA (Pharmacia) cut with Smal and 
dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase 

25 (GIBCO/BRL) and is incubated at 14°C for 4 hr. The ligation mixture then is phenol 
extracted and ethanol precipitated, and the precipitated DNA is dissolved in 20 \i\ TE 
buffer and clcctrophoresed on a 1 .0% low melting agarose gel. Discrete bands in a ladder 
are visualized by ethidium bromide-staining and UV illumination and identified by size as 
insert (I), vector (v), v+I, v+2i, v+3i, etc. The portion of the gel containing v+I DNA is 

30 excised and the v+I DNA is recovered and resuspended into 20 \i\ TE. The v+I DNA then 
is blunt-ended by T4 polymerase treatment for 5 min. at 37°C in a reaction mixture (50 
ul) containing the v+I linears, 500 \iM each of the 4 dNTPs, and 9 units of T4 
polymerase (New England BioLabs), under recommended buffer conditions. After phenol 
extraction and ethanol precipitation the repaired v+I linears are dissolved in 20 |il TE. 

35 The final ligation to produce circles is carried out in a 50 |tl reaction containing 5 |iJ of 
v+I linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70°C the 
following day, the reaction mixture is stored at -20°C. 
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This two-stage procedure results in a molecularly random collection of single- 
insert plasmid recombinants with minimal contamination from double-insert chimeras 
(<1%) or free vector (<3%). 

Since deviation from randomness can arise from propagation the DNA in the 
5 host, E. coli host cells deficient in all recombination and restriction functions (A. 

Greener, Strategies 3 (J):5 (1990)) are used to prevent rearrangements, deletions, and loss 
of clones by restriction. Furthermore, transformed cells are plated directly on antibiotic 
diffusion plates to avoid the usual broth recovery phase which allows multiplication and 
selection of the most rapidly growing cells. 

10 Plating is carried out as follows. A 100 |xl aliquot of Epicurian Coli SURE 11 

Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a chilled 
Falcon 2059 tube on ice. A 1.7 |il aliquot of 1.42 M beta-mercaptoethanol is added to the 
aliquot of cells to a final concentration of 25 mM Cells are incubated on ice for 10 min. 
A 1 (il aliquot of the final ligation is added to the cells and incubated on ice for 30 min. 

15 The cells are heat pulsed for 30 sec. at 42°C and placed back on ice for 2 min. The 

outgrowth period in liquid culture is eliminated from this protocol in order to minimize 
the preferential growth of any given transformed cell. Instead the transformation 
mixture is plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of 
SOB agar (5% SOB agar: 20 g tryptone, 5 g yeast extract, 0.5 g NaCl, 1.5% Difco Agar 

20 per liter of media). The 5 ml bottom layer is supplemented with 0.4 mi of 50 mg/ml 

ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 
ml X-Gal (2%), 1 ml MgC12 (1 M), and 1 ml MgSO4/100 ml SOB agar. The 15 ml top 
layer is poured just prior to plating. Our titer is approximately 100 colonies/10 u\l aliquot 
of transformation. 

25 All colonies are picked for template preparation regardless of size. Thus, only 

clones lost due to "poison" DNA or deleterious gene products are deleted from the library, 
resulting in a slight increase in gap number over that expected. 

3. Random DNA Sequencing 

30 High quality double stranded DNA plasmid templates are prepared using a "boiling 

bead" method developed in collaboration with Advanced Genetic Technology Corp. 
(Gaithersburg, MD) (Adams et al. 9 Science 252:1651 (1991); Adams et al, Nature 
555:632 (1992)). Plasmid preparation is performed in a 96-well format for all stages of 
DNA preparation from bacterial growth through final DNA purification. Template 

35 concentration is determined using Hoechst Dye and a Millipore Cytofluor. DNA 

concentrations are not adjusted, but low-yielding templates are identified where possible 
and not sequenced. 
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Templates are also prepared from an Enterococcus faecalis lambda genomic 
library in the vector DASH II (Stratagene). In particular, Enterococcus faecalis DNA (> 
100 kb) is partially digested in a reaction mixture (200 ul) containing 50 |ig DNA, IX 
Sau3AI buffer, 20 units Sau3AI for 6 min. at 23°C. The digested DNA was phenol- 
5 extracted and fractionated by sucrose density gradient centrifugation. Fractions of the 
sucrose gradient containing 1 5 to 25 kb are recovered in a final volume of 6 ul. One [il 
of fragments is used with 1 \i\ of lambda DASHII vector (Stratagene) in the recommended 
ligation reaction. One |il of the ligation mixture is used per packaging reaction following 
the recommended protocol with the Gigapack II XL Packaging Extract (Stratagene, 

10 #227711). Phage are plated directly without amplification from the packaging mixture 
(after dilution with 500 \i\ of recommended SM buffer and chloroform treatment). Yield 
is about 2.5x103 pfu/ul. An amplified library is prepared by infecting restructure NM539 
host E. coli cells eitn approximately 1x104 phage particles and recovering the progeny 
phages particles. The recovered phage is stored frozen in 7% dimethylsulfoxide. The 

15 phage titer is approximately 1x109 pfu/ml. 

For high throughput sequencing of individual lambda phage clones, liquid lysatcs 
(100 are prepared from randomly selected plaques (from the unamplified library) and 
template is prepared by long-range PGR using T7 and T3 vector-specific primers. 

Sequencing reactions are carried out on plasmid and/or PCR templates using the 

20 AB Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer 
Cycle Sequencing Kits for the M13 forward (M13-21) and the M13 reverse (M13RP1) 
primers (Adams et al, Nature 368:414 (1994)). Dye terminator sequencing reactions are 
carried out on the lambda templates on a Perkin-Elmer 9600 Thermocycler using the 
Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. T7 and T3 

25 primers are used to sequence the ends of the inserts from the Lambda DASH II library. 
Sequencing reactions are performed by eight individuals using an average of fourteen AB 
373 DNA Sequencers per day. All sequencing reactions are analyzed using the Stretch 
modification of the AB 373, primarily using a 34 cm well-to-read distance. The overall 
sequencing success rate very approximately is about 85% for M13-21 and M13RP1 

30 sequences and 65% for dye- terminator reactions. The average usable read length is 485 
bp for M13-21 sequences, 445bp for M13RP1 sequences, and 375 bp for dye-terminator 
reactions. 

Richards et al, Chapter 28 in AUTOMATED DNA SEQUENCING AND 
ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, (1994) 
35 described the value of using sequence from both ends of sequencing templates to facilitate 
ordering of contigs in shotgun assembly projects of lambda and cosmid clones. We 
balance the desirability of both-end sequencing (including the reduced cost of lower total 
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number of templates) against shorter read-lengths for sequencing reactions performed 
with the M13RP1 (reverse) primer compared to the M13-21 (forward) primer. 
Approximately one-half of the templates are sequenced from both ends. Random reverse 
sequencing reactions are done based on successful forward sequencing reactions. Some 
5 M13RP1 sequences are obtained in a semi-directed fashion: M13-21: sequences pointing 
outward at the ends of contigs are chosen for M13RP1 sequencing in an effort to 
specifically order contigs. 

4. Protocol for Automated Cycle Sequencing 

10 . The sequencing was carried out using AB1 Catalyst robots and AB 373 Automated 

DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and 
temperature control robot which has been developed specifically for DNA sequencing 
reactions. The Catalyst combines pre-aliquoted templates and reaction mixes consisting 
of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently- 

15 labelled sequencing primers, and reaction buffer. Reaction mixes and templates are 

combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive 
cycles of linear amplification (i.e.., one primer synthesis) steps are performed including 
denaturation, annealing of primer and template, and extension; i.e., DNA synthesis. A 
heated lid with rubber gaskets on the thermocycling plate prevents evaporation without 

20 the need for an oil overlay. 

Two sequencing protocols are used: one for dye-labelled primers and a second for 
dye-labelled dideoxy chain terminators. The shotgun sequencing involves use of four dye- 
labelled sequencing primers, one for each of the four terminator nucleotide. Each dye- 
primer is labelled with a different fluorescent dye, permitting the four individual reactions 

25 to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, 
and base-calling. ABI currently supplies pre-mixed reaction mixes in bulk packages 
containing all the necessary non-template reagents for sequencing. Sequencing can be 
done with both plasmid and PCR- generated templates with both dye-primers and dye- 
terminators with approximately equal fidelity, although plasmid templates generally give 

30 longer usable sequences. 

Thirty-two reactions are loaded per AB373 Sequencer each day, for a total of 960 
samples. Electrophoresis is run overnight following the manufacturer's protocols, and the 
data is collected for twelve hours. Following electrophoresis and fluorescence detection, 
the ABI 373 performs automatic lane tracking and base-calling. The lane- tracking is 

35 confirmed visually. Each sequence electropherogram (or fluorescence lane trace) is 

inspected visually and assessed for quality. Trailing sequences of low quality are removed 
and the sequence itself is loaded via software to a Sybase database (archived daily to 8mm 
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tape). Leading vector polylinker sequence is removed automatically by a software 
program. Average edited lengths of sequences from the standard ABI 373 arc around 400 
bp arid depend mostly on the quality of the template used for the sequencing reaction. 
ABI 373 Sequencers converted to Stretch Liners provide a longer electrophoresis path 
5 prior to fluorescence detection and increase the average number of usable bases to 500- 
600 bp. 

INFORMATICS 

1. Data Management 

10 A number of information management systems for a large-scale sequencing lab 

have been developed. (For review see, for instance, Kerlavage et al y Proceedings of the 
Twenty-Sixth Annual Hawaii International Conference on System Sciences, IEEE 
Computer Society Press, Washington D. C, 585 (1993)) The system used to collect and 
assemble the sequence data was developed using the Sybase relational database 

15 management system and was designed to automate data flow wherever possible and to 
reduce user error. The database stores and correlates all information collected during the 
entire operation from template preparation to final analysis of the genome. Because the 
raw output of the ABI 373 Sequencers was based on a Macintosh platform and the data 
management system chosen is based on a Unix platform, it was necessary to design and 

20 implement a variety of multi- user, client-server applications which allow the raw data as 
well as analysis results to flow seamlessly into the database with a minimum of user effort. 

2. Assembly 

An assembly engine (TIGR Assembler) developed for the rapid and accurate 
25 assembly of thousands of sequence fragments is employed to generate contigs. The TIGR 
assembler simultaneously clusters and assembles fragments of the genome. In order to 
obtain the speed necessary to assemble more than 104 fragments, the algorithm builds a 
hash table of 10 bp oligonucleotide subsequences to generate a list of potential sequence 
fragment overlaps. The number of potential overlaps for each fragment determines 
30 which fragments are likely to fall into repetitive elements. Beginning with a single seed 
sequence fragment, TIGR Assembler extends the current contig by attempting to add the 
best matching fragment based on oligonucleotide content. The contig and candidate 
fragment are aligned using a modified version of the Smith -Waterman algorithm which 
provides for optimal gapped alignments (Waterman, M. S., Methods in Enzymology 
35 164:165 (1988)). The contig is extended by the fragment only if strict criteria for the 
quality of the match are met. The match criteria include the minimum length of overlap, 
the maximum length of an unmatched end, and the minimum percentage match. These 
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criteria are automatically lowered by the algorithm in regions of minimal coverage and 
raised in regions with a possible repetitive element. The number of potential overlaps for 
each fragment determines which fragments are likely to fall into repetitive elements. 
Fragments representing the boundaries of repetitive elements and potentially chimeric 
5 fragments are often rejected based on partial mismatches at the ends of alignments and 
excluded from the current contig. TIGR Assembler is designed to take advantage of clone 
size information coupled with sequencing from both ends of each template. It enforces 
the constraint that sequence fragments from two ends of the same template point toward 
one another in the contig and are located within a certain range of base pairs (definable 
10 for each clone based on the known clone size range for a given library). 

The process resulted in 982 contigs as represented by SEQ ID NOs: 1-982. 

3. Identifying Genes 

The predicted coding regions of the Enterococcus faecalis genome were initially 
15 defined with the program GeneMark, which finds ORFs using a probabilistic classification 
technique. The predicted coding region sequences were used in searches against a database 
of all Enterococcus faecali nucleotide sequences from GenBank (March, 1 997), using the 
BLASTN search method to identify overlaps of 50 or more nucleotides with at least a 
95% identity. Those ORFs with nucleotide sequence matches are shown in Table 1. The 
20 ORFs without such matches were translated to protein sequences and compared to a non- 
redundant database of known proteins generated by combining the Swiss-prot, PIR and 
GenPept databases. ORFs that matched a database protein with BLASTP probability less 
than or equal to 0.01 are shown in Table 2. The table also lists assigned functions based 
on the closest match in the databases. ORFs that did not match protein or nucleotide 
25 sequences in the databases at these levels are shown in Table 3. 

ILLUSTRATIVE APPLICATIONS 

1. Production of an Antibody to a Enterococcus faecalis Protein 
Substantially pure protein or polypeptide is isolated from the transfected or 
30 transformed cells using any one of the methods known in the art. The protein can also 
be produced in a recombinant prokaryotic expression system, such as E. co/i, or can be 
chemically synthesized. Concentration of protein in the final preparation is adjusted, for 
example, by concentration on an Amicon filter device, to the level of a few 
micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared 
35 as follows. 
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2. Monocl nal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and isolated as 
described can be prepared from murine hybridomas according to the classical method of 
Kohler, G. and Milstein, C, Nature 256:495 (1975) or modifications of the methods 
5 thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected 
protein over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen isolated. The spleen cells are fused by means of 
polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by 
growth of the system on selective media comprising aminopterin (HAT media). The 

10 successfully fused cells are diluted and aliquots of the dilution placed in wells of a 

micro titer plate where growth of the culture is continued. Antibody-producing clones are 
identified by detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA, as originally described by Engvall, E., Metk EnzymoL 70:419 
(1980), and modified methods thereof. Selected positive clones can be expanded and 

15 their monoclonal antibody product harvested for use. Detailed procedures for monoclonal 
antibody production are described in Davis, L. et al, Basic Methods in Molecular Biology, 
Elsevier, New York. Section 21-2 (1989). 

3. Polyclonal Antibody Production by Immunization 

20 Polyclonal antiserum containing antibodies to heterogenous epitopes of a single 

protein can be prepared by immunizing suitable animals with the expressed protein 
described above, which can be unmodified or modified to enhance immunogenicity. 
Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. For example, small molecules tend to be less immunogenic 

25 than others and may require the use of carriers and adjuvant. Also, host animals vary in 
response to site of inoculations and dose, with both inadequate or excessive doses of 
antigen resulting in low titer antiscra. Small doses (ng level) of antigen administered at 
multiple intradermal sites appears to be most reliable. An effective immunization 
protocol for rabbits can be found in Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 

30 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when 
antibody titer thereof, as determined semi-quantitatively, for example, by double 
immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, 
for example, Ouchterlony, O. et al, Chap. 19 in: Handbook of Experimental 

35 Immunology, Wier, D., ed, Blackwell (1973). Plateau concentration of antibody is 
usually in the range of 0. 1 to 0. 2 mg/ml of serum (about 12M). Affinity of the 
antisera for the antigen is determined by preparing competitive binding curves, as 



WO 98/50555 



PCT/US98/08985 



described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 
second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, Washington, D. C. 
(1980) 

Antibody preparations prepared according to either protocol are useful in 
5 quantitative immunoassays which determine concentrations of antigen-bearing substances 
in biological samples; they are also used semi- quantitatively or qualitatively to identify 
the presence of antigen in a biological sample. In addition, antibodies are useful in various 
animal models of enterococcal disease as a means of evaluating the protein used to make 
the antibody as a potential vaccine target or as a means of evaluating the antibody as a 
10 potential immunotherapeutic or immunoprophylactic reagent. 

4. Preparation of PCR Primers and Amplification of DNA 

Various fragments of the Enterococcus faecalis genome, such as those of Tables 
1-3 and SEQ ID NOS: 1-982 can be used, in accordance with the present invention, to 
15 prepare PCR primers for a variety of uses. The PCR primers are preferably at least 15 
bases, and more preferably at least 18 bases in length. When selecting a primer sequence, 
it is preferred that the primer pairs have approximately the same G/C ratio, so that 
melting temperatures are approximately the same. The PCR primers and amplified DNA 
of this Example find use in the Examples that follow. 

20 

5. Isolation of a Selected DNA Clone From the Deposited 
Sample of E. faecalis 

Three approaches can be used to isolate a E. faecalis clone comprising a 
polynucleotide of the present invention from any E. faecalis genomic DNA library. The 
25 E. faecalis strain V586 has been deposited as a convienent source for obtaining a E. 
faecalis strain although a wide varity of strains E. faecalis strains can be used which are 
known in the art. 

E. faecalis genomic DNA is prepared using the following method. A 20ml 
overnight bacterial culture grown in a rich medium (e.g., Trypticase Soy Broth, Brain 

30 Heart Infusion broth or Super broth), pelleted, ished two times with TES (30mM Tris-pH 
8.0, 25mM EDTA, 50mM NaCl), and resuspended in 5ml high salt TES (2.5M NaCl). 
Lysostaphin is added to final concentration of approx 50ug/ml and the mixture is rotated 
slowly 1 hour at 37C to make protoplast cells. The solution is then placed in incubator 
(or place in a shaking water bath) and warmed to 55C. Five hundred micro liter of 20% 

35 sarcosyl in TES (final concentration 2%) is then added to lyse the cells. Next, guanidine 
HC1 is added to a final concentration of 7M (3.69g in 5.5 ml). The mixture is swirled 
slowly at 55C for 60-90 min (solution should clear). A CsCl gradient is then set up in 
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SW41 ultra clear tubes using 2.0ml 5.7M CsCl and overlaying with 2.85M CsCl. The 
gradient is carefully overlayed with the DNA-containing GuHCl solution. The gradient is 
spun at 30,000 rpm, 20C for 24 hr and the lower DNA band is collected. The volume is 
increased to 5 ml with TE buffer. The DNA is then treated with protease K (10 ug/ml) 
5 overnight at 37 C, and precipitated with ethanol. The precipitated DNA is resuspended in 
a desired buffer. 

In the first method, a plasmid is directly isolated by screening a plasmid E. 
faecalis genomic DNA library using a polynucleotide probe corresponding to a 
polynucleotide of the present invention. Particularly, a specific polynucleotide with 30- 

10 40 nucleotides is synthesized using an Applied Biosy stems DNA synthesizer according to 
the sequence reported. The oligonucleotide is labeled, for instance, with 32 P-y-ATP using 
T4 polynucleotide kinase and purified according to routine methods. (See, e.g., Maniatis 
et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring, 
NY (1982).) The library is transformed into a suitable host, as indicated above (such as 

15 XL-1 Blue (Stratagene)) using techniques known to those of skill in the art. See, e.g., 
Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL (Cold Spring 
Harbor, N.Y. 2nd ed. 1989); Ausubel et al., CURRENT PROTOCALS IN MOLECULAR 
BIOLOGY (John Wiley and Sons, N.Y. 1989). The transformants are plated on 1.5% 
agar plates (containing the appropriate selection agent, e.g., ampicillin) to a density of 

20 about 150 transformants (colonics) per plate. These plates are screened using Nylon 
membranes according to routine methods for bacterial colony screening. See, e.g., 
Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL (Cold Spring 
Harbor, N.Y. 2nd ed. 1989); Ausubel et al., CURRENT PROTOCALS IN MOLECULAR 
BIOLOGY (John Wiley and Sons, N.Y. 1989) or other techniques known to those of skill 

25 in the art. 

Alternatively, two primers of 15-25 nucleotides derived from the 5* and 3* ends of 
a polynucleotide of SEQ ID NOS: 1-982 are synthesized and used to amplify the desired 
DNA by PCR using a E. faecalis genomic DNA prep as a template. PCR is carried out 
under routine conditions, for instance, in 25 \i\ of reaction mixture with 0.5 ug of the 

30 above DNA template. A convenient reaction mixture is 1.5-5 mM MgCl 2 , 0.01% (w/v) 
gelatin, 20 u\M each of dATP, dCTP, dGTP, dTTP, 25 pmol of each primer and 0.25 
Unit of Taq polymerase. Thirty five cycles of PCR (denaturation at 94°C for 1 min; 
annealing at 55°C for 1 min; elongation at 72°C for 1 min) are performed with a Perkin- 
Elmer Cetus automated thermal cycler. The amplified product is analyzed by agarose gel 

35 electrophoresis and the DNA band with expected molecular weight is excised and purified. 
The PCR product is verified to be the selected sequence by subcloning and sequencing the 
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DNA product. 

Finally, overlapping oligos of the DNA sequences of SEQ ID NOS: 1-982 can be 
chemically synthesized and used to generate a nucleotide sequence of desired length using 
PCR methods known in the art. 

5 

6(a). Expression and Purification Enterococcal polypeptides 
in E. coli 

The bacterial expression vector pQE60 was used for bacterial expression of some 
of the polypeptide fragements of the present invention which were used in the soft tissue 

10 and systemic infection models discussed below. (QIAGEN, Inc., 9259 Eton Avenue, 

Chatsworth, CA, 91311). pQE60 encodes ampicillin antibiotic resistance ("Ampr") and 
contains a bacterial origin of replication ("ori"), an IPTG inducible promoter, a ribosome 
binding site ("RBS"), six codons encoding histidine residues that allow affinity purification 
using nickel-nitrilo-tri-acetic acid ("Ni-NTA") affinity resin (QIAGEN, Inc., supra) and 

15 suitable single restriction enzyme cleavage sites. These elements are arranged such that 
an inserted DNA fragment encoding a polypeptide expresses that polypeptide with the six 
His residues (i.e., a "6 X His tag") covalently linked to the carboxyl terminus of that 
polypeptide. 

The DNA sequence encoding the desired portion of a E. faecalis protein of the 

20 present invention was amplified from E. faecalis genomic DNA using PCR 

oligonucleotide primers which anneal to the 5' and 3' sequences coding for the portions 
of the E. faecalis polynucleotide shown in SEQ ID NOS: 1-982. Additional nucleotides 
containing restriction sites to facilitate cloning in the pQE60 vector are added to the 5' 
and 3 1 sequences, respectively. 

25 For cloning the mature protein, the 5' primer has a sequence containing an 

appropriate restriction site followed by nucleotides of the amino terminal coding sequence 
of the desired E. faecalis polynucleotide sequence in SEQ ID NOS: 1-982. One of ordinary 
skill in the art would appreciate that the point in the protein coding sequence where the 5' 
and 3* primers begin may be varied to amplify a DNA segment encoding any desired 

30 portion of the complete protein shorter or longer than the mature form. The 3' primer 
has a sequence containing an appropriate restriction site followed by nucleotides 
complementary to the 3 1 end of the polypeptide coding sequence of SEQ ID NOS:l-982, 
excluding a stop codon, with the coding sequence aligned with the restriction site so as to 
maintain its reading frame with that of the six His codons in the pQE60 vector. 

35 The amplified E. faecalis DNA fragment and the vector pQE60 were digested 

with restriction enzymes which recognize the sites in the primers and the digested DNAs 
were then ligated together. The E. faecalis DNA was inserted into the restricted pQE60 
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vector in a manner which places the E. faecalis protein coding region downstream from 
the IPTG-inducible promoter and in-frame with an initiating AUG and the six histidine 
codons. 

The ligation mixture was transformed into competent E. coli cells using standard 
5 procedures such as those described by Sambrook et aL, supra.. E. coli strain M15/rep4, 
containing multiple copies of the plasmid pREP4, which expresses the lac repressor and 
confers kanamycin resistance ("Kanr"), was used in carrying out the illustrative example 
described herein. This strain, which was only one of many that are suitable for expressing 
a E. faecalis polypeptide, is available commercially (QIAGEN, Inc., supra). 

10 Transformants were identified by their ability to grow on LB agar plates in the presence 
of ampicillin and kanamycin. Plasmid DNA was isolated from resistant colonies and the 
identity of the cloned DNA confirmed by restriction analysis, PCR and DNA sequencing. 

Clones containing the desired constructs were grown overnight ("O/N") in liquid 
culture in LB media supplemented with both ampicillin (100 u\g/ml) and kanamycin (25 

15 |i.g/ml). The O/N culture was used to inoculate a large culture, at a dilution of 

approximately 1:25 to 1:250. The cells were grown to an optical density at 600 nm 
("OD600") of between 0.4 and 0.6. Isopropyl-p-D-thiogalactopyranoside ("IPTG") was 
then added to a final concentration of 1 mM to induce transcription from the lac 
repressor sensitive promoter, by inactivating the lac I repressor. Cells subsequently were 

20 incubated further for 3 to 4 hours. Cells then were harvested by centrifugation. 

The cells were then stirred for 3-4 hours at 4°C in 6M guanidine-HCl, pH 8. The 
cell debris was removed by centrifugation, and the supernatant containing the E. faecalis 
polypeptide was loaded onto a nickel-nitrilo-tri-acetic acid ("Ni-NTA") affinity resin 
column (QIAGEN, Inc., supra). Proteins with a 6 x His tag bind to the Ni-NTA resin 

25 with high affinity were purified in a simple one-step procedure (for details see: The 

QIAexpressionist, 1995, QIAGEN, Inc., supra). Briefly the supernatant was loaded onto 
the column in 6 M guanidine-HCl, pH 8, the column was first washed with 10 volumes of 
6 M guanidine-HCl, pH 8, then washed with 1 0 volumes of 6 M guanidine-HCl pH 6, and 
finally the E. faecalis polypeptide was eluted with 6 M guanidine-HCl, pH 5. 

30 The purified protein was then renatured by dialyzing it against phosphate-buffered 

saline (PBS) or 50 mM Na-acetate, pH 6 buffer plus 200 mM NaCL Alternatively, the 
protein could be successfully refolded while immobilized on the Ni-NTA column. The 
recommended conditions are as follows: renature using a linear 6M-1M urea gradient in 
500 mM NaCl, 20% glycerol, 20 mM Tris/HCl pH 7.4, containing protease inhibitors. 

35 The renaturation should be performed over a period of 1.5 hours or more. After 
renaturation the proteins can be eluted by the addition of 250 mM immidazole. 
Immidazole was removed by a final dialyzing step against PBS or 50 mM sodium acetate 
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pH 6 buffer plus 200 mM NaCl. The purified protein was stored at 4°C or frozen at -80° 
C. 

Some of the polypeptide of the present invention were prepared using a non- 
denaturing protein purification method. For these polypeptides, the cell pellet from each 
5 liter of culture was resuspended in 25 mis of Lysis Buffer A at 4°C (Lysis Buffer A = 50 
mM Na-phosphate, 300 mM NaCl, 10 mM 2-mercaptoethanol, 10% Glycerol, pH 7.5 
with 1 tablet of Complete EDTA-free protease inhibitor cocktail (Boehringer Mannheim 
#1873580) per 50 ml of buffer). Absorbance at 550 nm was approximately 10-20 
O.D./ml. The suspension was then put through three freeze/thaw cycles from -70°C 

10 (using a ethanol-dry ice bath) up to room temperature. The cells were lysed via 

sonication in short 10 sec bursts over 3 minutes at approximately 80W while kept on ice. 
The sonicated sample was then centrifuged at 15,000 RPM for 30 minutes at 4°C. The 
supernatant was passed through a column containing 1.0 ml of CL-4B resin to pre-clear 
the sample of any proteins that may bind to agarose non-specifically, and the flow- 

15 through fraction was collected. 

The pre-cleared flow-through was applied to a nickel-nitrilo-tri -acetic acid ("Ni- 
NTA") affinity resin column (Quiagen, Inc., supra). Proteins with a 6 X His tag bind to 
the Ni-NTA resin with high affinity and can be purified in a simple one-step procedure. 
Briefly, the supernatant was loaded onto the column in Lysis Buffer A at 4°C, the column 

20 was first washed with 1 0 volumes of Lysis Buffer A until the A280 of the eluate returns to 
the baseline. Then, the column was washed with 5 volumes of 40 mM Imidazole (92% 
Lysis Buffer A / 8% Buffer B) (Buffer B = 50 mM Na-Phosphate, 300 mM NaCl, 10% 
Glycerol, 10 mM 2-mercaptoethanol, 500 mM Imidazole, pH of the final buffer should be 
7.5). The protein was eluted off of the column with a series of increasing Imidazole 

25 solutions made by adjusting the ratios of Lysis Buffer A to Buffer B. Three different 
concentrations were used: 3 volumes of 75 mM Imidazole, 3 volumes of 150 mM 
Imidazole, 5 volumes of 500 mM Imidazole. The fractions containing the purified 
protein were analyzed using 8 %, 10 % or 14% SDS-PAGE depending on the protein size. 
The purified protein was then dialyzed 2X against phosphate-buffered saline (PBS) in 

30 order to place it into an easily workable buffer. The purified protein was stored at 4° C or 
frozen at -80°. 

The following alternative method may be used to purify E. faecalis expressed in E 
coli when it is present in the form of inclusion bodies. Unless otherwise specified, all of 
the following steps are conducted at 4-1 0°C. 
35 Upon completion of the production phase of the E. coli fermentation, the cell 

culture is cooled to 4- 1 0°C and the cells are harvested by continuous centrifugation at 
15,000 rpm (Heraeus Sepatech). On the basis of the expected yield of protein per unit 
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weight of cell paste and the amount of purified protein required, an appropriate amount 
of cell paste, by weight, is suspended in a buffer solution containing 100 mM Tris, 50 mM 
EDTA, pH 7.4. The cells are dispersed to a homogeneous suspension using a high shear 
mixer. 

5 The cells are then lysed by passing the solution through a microfluidizer 

(Microfuidics, Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is 
then mixed with NaCl solution to a final concentration of 0.5 M NaCl, followed by 
ccntrifugation at 7000 x g for 15 min. The resultant pellet is washed again using 0.5M 
NaCl, 100 mM Tris, 50 mM EDTA, pH 7.4. 

10 The resulting washed inclusion bodies are solubilized with 1.5 M guanidine 

hydrochloride (GuHCl) for 2-4 hours. After 7000 x g centrifugation for 15 min., the 
pellet is discarded and the E. faecalis polypeptide-containing supernatant is incubated at 
4°C overnight to allow further GuHCl extraction. 

Following high speed centrifugation (30,000 x g) to remove insoluble particles, 

15 the GuHCl solubilized protein is refolded by quickly mixing the GuHCl extract with 20 
volumes of buffer containing 50 mM sodium, pH 4.5, 150 mM NaCl, 2 mM EDTA by 
vigorous stirring. The refolded diluted protein solution is kept at 4°C without mixing for 
12 hours prior to further purification steps. 

To clarify the refolded E. faecalis polypeptide solution, a previously prepared 

20 tangential filtration unit equipped with 0.16 \im membrane filter with appropriate surface 
area (e.g., Filtron), equilibrated with 40 mM sodium acetate, pH 6.0 is employed. The 
filtered sample is loaded onto a cation exchange resin (e.g., Poros HS-50, Perseptive 
Biosystems). The column is washed with 40 mM sodium acetate, pH 6.0 and eluted with 
250 mM, 500 mM, 1000 mM, and 1500 mM NaCl in the same buffer, in a stepwise 

25 manner. The absorbancc at 280 mm of the effluent is continuously monitored. Fractions 
are collected and further analyzed by SDS-PAGE. 

Fractions containing the E. faecalis polypeptide are then pooled and mixed with 4 
volumes of water. The diluted sample is then loaded onto a previously prepared set of 
tandem columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion 

30 (Poros CM-20, Perseptive Biosystems) exchange resins. The columns are equilibrated 
with 40 mM sodium acetate, pH 6.0. Both columns are washed with 40 mM sodium 
acetate, pH 6.0, 200 mM NaCl. The CM-20 column is then eluted using a 10 column 
volume linear gradient ranging from 0.2 M NaCl, 50 mM sodium acetate, pH 6.0 to 1 .0 
M NaCl, 50 mM sodium acetate, pH 6.5. Fractions are collected under constant A2go 

35 monitoring of the effluent. Fractions containing the E. faecalis polypeptide (determined, 
for instance, by 16% SDS-PAGE) are then pooled. 
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The resultant E. faecalis polypeptide exhibits greater than 95% purity after the 
above refolding and purification steps. No major contaminant bands are observed from 
Commassie blue stained 16% SDS-PAGE gel when 5 \ig of purified protein is loaded. The 
purified protein is also tested for cndotoxin/LPS contamination, and typically the LPS 
5 content is less than 0.1 ng/ml according to LAL assays. 

6(b). Alternative Expression and Purification Enterococcal 
polypeptides in E. coli 

Tthe vector pQElO was alternatively used to clone and express some of the 
10 polypeptides of the present invention for use in the soft tissue and systemic infection 
models discussed below. The difference being such that an inserted DNA fragment 
encoding a polypeptide expresses that polypeptide with the six His residues (i.e., a "6 X 
His tag") covalently linked to the amino terminus of that polypeptide. The bacterial 
expression vector pQElO (QIAGEN, Inc., 9259 Eton Avenue, Chatsworth, CA, 91311) 
15 was used in this example . The components of the pQElO plasmid are arranged such that 
the inserted DNA sequence encoding a polypeptide of the present invention expresses the 
polypeptide with the six His residues (i.e., a "6 X His tag")) covalently linked to the 
amino terminus. 

The DNA sequences encoding the desired portions of a polypeptide of SEQ ID 

20 NOS: 1-982 were amplified using PCR oligonucleotide primers from genomic E. faecalis 
DNA. The PCR primers anneal to the nucleotide sequences encoding the desired amino 
acid sequence of a polypeptide of the present invention. Additional nucleotides 
containing restriction sites to facilitate cloning in the pQElO vector were added to the 5' 
and 3' primer sequences, respectively. 

25 For cloning a polypeptide of the present invention, the 5' and 3' primers were 

selected to amplify their respective nucleotide coding sequences. One of ordinary skill in 
the art would appreciate that the point in the protein coding sequence where the 5' and 3' 
primers begins may be varied to amplify a DNA segment encoding any desired portion of 
a polypeptide of the present invention. The 5* primer was designed so the coding 

30 sequence of the 6 X His tag is aligned with the restriction site so as to maintain its reading 
frame with that of E. faecalis polypeptide. The 3' was designed to include an stop codon. 
The amplified DNA fragment was then cloned, and the protein expressed, as described 
above for the pQE60 plasmid. 

The DNA sequences encoding the amino acid sequences of SEQ ID NOS: 1-982 

35 may also be cloned and expressed as fusion proteins by a protocol similar to that described 
directly above, wherein the pET-32b(+) vector (Novagen, 601 Science Drive, Madison, 
Wl 5371 1) is preferentially used in place of pQElO. 
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The above methods are not limited to the polypeptide fragements actually 
produced. The above method, like the methods below, can be used to produce either full 
length polypeptides or desired fragements therof. 

5 6(c). Alternative Expression and Purification of Enterococcal 

polypeptides in E. coli 

The bacterial expression vector pQE60 is used for bacterial' expression in this 
example (QIAGEN, Inc., 9259 Eton Avenue, Chatsworth, CA, 91311). However, in this 
example, the polypeptide coding sequence is inserted such that translation of the six His 

10 codons is prevented and, therefore, the polypeptide is produced with no 6 X His tag. 

The DNA sequence encoding the desired portion of the E. faecalis amino acid 
sequence is amplified from an E. faecalis genomic DNA prep the deposited DNA clones 
using PCR oligonucleotide primers which anneal to the 5' and V nucleotide sequences 
corresponding to the desired portion of the E. faecalis polypeptides. Additional 

15 nucleotides containing restriction sites to facilitate cloning in the pQE60 vector are added 
to the 5' and 3* primer sequences. 

For cloning a E. faecalis polypeptides of the present invention, 5' and 3' primers 
are selected to amplify their respective nucleotide coding sequences. One of ordinary skill 
in the art would appreciate that the point in the protein coding sequence where the 5' and 

20 3' primers begin may be varied to amplify a DNA segment encoding any desired portion 
of a polypeptide of the present invention. The 3* and 5' primers contain appropriate 
restriction sites followed by nucleotides complementary to the 5' and 3' ends of the 
coding sequence respectively. The 3' primer is additionally designed to include an in- 
frame stop codon. 

25 The amplified E. faecalis DNA fragments and the vector pQE60 are digested with 

restriction enzymes recognizing the sites in the primers and the digested DNAs are then 
ligated together. Insertion of the E. faecalis DNA into the restricted pQE60 vector 
places the E. faecalis protein coding region including its associated stop codon 
downstream from the IPTG- inducible promoter and in-frame with an initiating AUG, 

30 The associated stop codon prevents translation of the six histidine codons downstream of 
the insertion point. 

The ligation mixture is transformed into competent E. coli cells using standard 
procedures such as those described by Sambrook et al. E. coli strain M15/rep4, containing 
multiple copies of the plasmid pREP4, which expresses the lac repressor and confers 
35 kanamycin resistance ("Kanr"), is used in carrying out the illustrative example described 
herein. This strain, which is only one of many that are suitable for expressing E. faecalis 
polypeptide, is available commercially (QIAGEN, Inc., supra). Trans formants are 
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identified by their ability to grow on LB plates in the presence of ampicillin and 
kanamycin. Plasmid DNA is isolated from resistant colonies and the identity of the 
cloned DNA confirmed by restriction analysis, PCR and DNA sequencing. 

Clones containing the desired constructs are grown overnight ("O/N") in liquid 
5 culture in LB media supplemented with both ampicillin (100 fig/ml) and kanamycin (25 
|Xg/ml). The O/N culture is used to inoculate a large culture, at a dilution of approximately 
1:25 to 1:250. The cells are grown to an optical density at 600 nm ("OD600") of 
between 0.4 and 0.6. isopropyl-b-D-thiogalactopyranoside ("IPTG") is then added to a 
final concentration of 1 mM to induce transcription from the lac repressor sensitive 

10 promoter, by inactivating the lad repressor. Cells subsequently are incubated further for 
3 to 4 hours. Cells then are harvested by centrifugation. 

To purify the E. faecalis polypeptide, the cells are then stirred for 3-4 hours at 
4°C in 6M guanidine-HCl, pH 8. The cell debris is removed by centrifugation, and the 
supernatant containing the E. faecalis polypeptide is dialyzed against 50 mM Na-acetate 

15 buffer pH 6, supplemented with 200 mM NaCl. Alternatively, the protein can be 

successfully refolded by dialyzing it against 500 mM NaCl, 20% glycerol, 25 mM Tris/IICl 
pH 7.4, containing protease inhibitors. After renaturation the protein can be purified by 
ion exchange, hydrophobic interaction and size exclusion chromatography. 
Alternatively, an affinity chromatography step such as an antibody column can be used to 

20 obtain pure E. faecalis polypeptide. The purified protein is stored at 4° C or frozen at - 
80° C. 

The following alternative method may be used to purify E. faecalis polypeptides 
expressed in E colt when it is present in the form of inclusion bodies. Unless otherwise 
specified, all of the following steps are conducted at 4-10°C. 

25 Upon completion of the production phase of the E. coli fermentation, the cell 

culture is cooled to 4-10°C and the cells are harvested by continuous centrifugation at 
15,000 rpm (Heraeus Sepatech). On the basis of the expected yield of protein per unit 
weight of cell paste and the amount of purified protein required, an appropriate amount 
of cell paste, by weight, is suspended in a buffer solution containing 100 mM Tris, 50 mM 

30 EDTA, pH 7.4. The cells are dispersed to a homogeneous suspension using a high shear 
mixer. 

The cells ware then lysed by passing the solution through a microfluidizer 
(Microfuidics, Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is 
then mixed with NaCl solution to a final concentration of 0.5 M NaCl, followed by 
35 centrifugation at 7000 x g for 15 min. The resultant pellet is washed again using 0.5M 
NaCl, 100 mM Tris, 50 mM EDTA, pH 7.4. 
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The resulting washed inclusion bodies are solubilized with 1 .5 M guanidine 
hydrochloride (GuHCl) for 2-4 hours. After 7000 x g centrifugation for 15 min., the 
pellet is discarded and the E. faecalis polypeptide-containing supernatant is incubated at 
4°C overnight to allow further GuHCl extraction. 
5 Following high speed centrifugation (30,000 x g) to remove insoluble particles, 

the GuHCl solubilized protein is refolded by quickly mixing the GuHCl extract with 20 
volumes of buffer containing 50 mM sodium, pH 4.5, 150 mM NaCl, 2 mM EDTA by 
vigorous stirring. The refolded diluted protein solution is kept at 4°C without mixing for 
12 hours prior to further purification steps. 

10 To clarify the refolded £. faecalis polypeptide solution, a previously prepared 

tangential filtration unit equipped with 0.16 \lm membrane filter with appropriate surface 
area (e.g., Filtron), equilibrated with 40 mM sodium acetate, pH 6.0 is employed. The 
filtered sample is loaded onto a cation exchange resin (e.g., Poros HS-50, Perseptive 
Biosystems). The column is washed with 40 mM sodium acetate, pH 6.0 and elutcd with 

15 250 mM, 500 mM, 1000 mM, and 1500 mM NaCl in the same buffer, in a stepwise 

manner. The absorbance at 280 mm of the effluent is continuously monitored. Fractions 
are collected and further analyzed by SDS-PAGE. 

Fractions containing the E. faecalis polypeptide arc then pooled and mixed with 4 
volumes of water. The diluted sample is then loaded onto a previously prepared set of 

20 tandem columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion 
(Poros CM-20, Perseptive Biosystems) exchange resins. The columns are equilibrated 
with 40 mM sodium acetate, pH 6.0. Both columns are washed with 40 mM sodium 
acetate, pH 6.0, 200 mM NaCl. The CM-20 column is then eluted using a 10 column 
volume linear gradient ranging from 0.2 M NaCl, 50 mM sodium acetate, pH 6.0 to 1.0 

25 M NaCl, 50 mM sodium acetate, pH 6.5. Fractions are collected under constant A 2 »o 

monitoring of the effluent. Fractions containing the E. faecalis polypeptide (determined, 
for instance, by 16% SDS-PAGE) are then pooled. 

The resultant E. faecalis polypeptide exhibits greater than 95% purity after the 
above refolding and purification steps. No major contaminant bands are observed from 

30 Commassie blue stained 1 6% SDS-PAGE gel when 5 u\g of purified protein is loaded. The 
purified protein is also tested for endotoxin/LPS contamination, and typically the LPS 
content is less than 0.1 ng/ml according to LAL assays. 



35 



6(d). Cloning and Expression of E. faecalis in Other Bacteria 

E. faecalis polypeptides can also be produced in: £. faecalis using the methods of 
S. Skinner et al., (1988) Mol. Microbiol. 2:289-297 or J. I. Moreno (1996) Protein Expr. 
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Purif. 8(3):332-340; Lactobacillus using the methods of C. Rush et al., 1997 Appl. 
Microbiol. Biotechnol. 47(5):537-542; or in Bacillus subiilis using the methods Chang ct 
al., U.S. Patent No. 4,952,508. 

5 7. Cloning and Expression in COS Cells 

A E. faecal is expression plasmid is made by cloning a portion of the DNA 
encoding a E. faecalis polypeptide into the expression vector pDNAI/Amp or pDNAIIl 
(which can be obtained from Invitrogcn, Inc.). The expression vector pDNAI/amp 
contains: (1) an E. coli origin of replication effective for propagation in E, coli and other 

10 prokaryotic cells; (2) an ampicillin resistance gene for selection of plasmid-containing 
prokaryotic cells; (3) an SV40 origin of replication for propagation in eukaryotic cells; 
(4) a CMV promoter, a polylinker, an SV40 intron; (5) several codons encoding a 
hemagglutinin fragment (i.e., an "HA" tag to facilitate purification) followed by a 
termination codon and polyadenylation signal arranged so that a DNA can be 

15 conveniently placed under expression control of the CMV promoter and operably linked 
to the SV40 intron and the polyadenylation signal by means of restriction sites in the 
polylinker. The HA tag corresponds to an epitope derived from the influenza 
hemagglutinin protein described by Wilson et al. 1984 Cell 37:767. The fusion of the HA 
tag to the target protein allows easy detection and recovery of the recombinant protein 

20 with an antibody that recognizes the HA epitope. pDNAIIl contains, in addition, the 
selectable neomycin marker. 

A DNA fragment encoding a E. faecalis polypeptide is cloned into the polylinker 
region of the vector so that recombinant protein expression is directed by the CMV 
promoter. The plasmid construction strategy is as follows. The DNA from a E. faecalis 

25 genomic DNA prep is amplified using primers that contain convenient restriction sites, 
much as described above for construction of vectors for expression of E. faecalis in 
E, coli. The 5* primer contains a Kozak sequence, an AUG start codon, and nucleotides 
of the 5' coding region of the E. faecalis polypeptide. The 3* primer, contains 
nucleotides complementary to the 3' coding sequence of the E. faecalis DNA, a stop 

30 codon, and a convenient restriction site. 

The PCR amplified DNA fragment and the vector, pDNAI/Amp, are digested with 
appropriate restriction enzymes and then ligated. The ligation mixture is transformed 
into an appropriate E. coli strain such as SURE™ (Stratagene Cloning Systems, La Jolla, 
CA 92037), and the transformed culture is plated on ampicillin media plates which then 

35 are incubated to allow growth of ampicillin resistant colonies. Plasmid DNA is isolated 
from resistant colonies and examined by restriction analysis or other means for the 
presence of the fragment encoding the E. faecalis polypeptide 
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For expression of a recombinant E. faecalis polypeptide, COS cells are transfected 
with an expression vector, as described above, using DEAE-dextran, as described, for 
instance, by Sambrook et al. (supra). Cells are incubated under conditions for expression 
of E. faecalis by the vector. 
5 Expression of the E. faecalis-HA fusion protein is detected by radiolabeling and 

immunoprecipitation, using methods described in, for example Harlow ct al., supra.. To 
this end, two days after transfection, the cells are labeled by incubation in media 
containing 35 S-cysteine for 8 hours. The cells and the media are collected, and the cells 
are washed and the lysed with detergent-containing RIPA buffer: 150 mM NaCl, 1% NP- 
10 40, 0.1% SDS, 1% NP-40, 0.5% DOC, 50 mM TRIS, pH 7.5, as described by Wilson et al. 
(supra ). Proteins are precipitated from the cell lysate and from the culture media using 
an HA -specific monoclonal antibody. The precipitated proteins then are analyzed by 
SDS-PAGE and autoradiography. An expression product of the expected size is seen in 
the cell lysate, which is not seen in negative controls. 

15 

8. Cloning and Expression in CHO Cells 

The vector pC4 is used for the expression of E. faecalis polypeptide in this 
example. Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Accession No. 
37146). The plasmid contains the mouse DHFR gene under control of the SV40 early 

20 promoter. Chinese hamster ovary cells or other cells lacking dihydrofolate activity that 
are transfected with these plasmids can be selected by growing the cells in a selective 
medium (alpha minus MEM, Life Technologies) supplemented with the chemotherapeutic 
agent methotrexate. The amplification of the DHFR genes in cells resistant to 
methotrexate (MTX) has been well documented. See, e.g., Alt et al., 1978, J. Biol. 

25 Chem. 253:1357-1370; Hamlin et al., 1990, Biochem. et Biophys. Acta, 1097:107-143; 
Page et al., 1991, Biotechnology 9:64-68. Cells grown in increasing concentrations of 
MTX develop resistance to the drug by overproducing the target enzyme, DHFR, as a 
result of amplification of the DHFR gene. If a second gene is linked to the DHFR gene, it 
is usually co-amplified and over-expressed. It is known in the art that this approach may 

30 be used to develop cell lines carrying more than 1,000 copies of the amplified gene(s). 
Subsequently, when the methotrexate is withdrawn, cell lines are obtained which contain 
the amplified gene integrated into one or more chromosome(s) of the host cell. 

Plasmid pC4 contains the strong promoter of the long terminal repeat (LTR) of 
the Rouse Sarcoma Virus, for expressing a polypeptide of interest, Cullen, et al. (1985) 

35 Mol. Cell. Biol. 5:438-447; plus a fragment isolated from the enhancer of the immediate 
early gene of human cytomegalovirus (CMV), Boshart, et al., 1985, Cell 41:521-530. 
Downstream of the promoter are the following single restriction enzyme cleavage sites 
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that allow the integration of the genes: Bam HI, Xba I, mdAsp 718. Behind these 
cloning sites the plasmid contains the 3' intron and polyadenylation site of the rat 
preproinsulin gene. Other high efficiency promoters can also be used for the expression, 
e.g., the human G-actin promoter, the SV40 early or late promoters or the long terminal 
5 repeats from other retroviruses, e.g., HIV and HTLVI. Clontech's Tet-Off and Tet-On 
gene expression systems and similar systems can be used to express the E. faecalis 
polypeptide in a regulated way in mammalian cells (Gossen et al., 1992, Proc. Natl. Acad. 
Sci. USA 89:5547-5551. For the polyadenylation of the mRNA other signals, e.g., from 
the human growth hormone, or globin genes can be used as well. Stable cell lines carrying 
10 a gene of interest integrated into the chromosomes can also be selected upon co- 
transfection with a selectable marker such as gpt, G418 or hygromycin. It is 
advantageous to use more than one selectable marker in the beginning, e.g., G418 plus 
methotrexate. 

The plasmid pC4 is digested with the restriction enzymes and then 
15 dephosphorylated using calf intestinal phosphates by procedures known in the art. The 
vector is then isolated from a 1% agarose gel. The DNA sequence encoding the E. 
faecalis polypeptide is amplified using PCR oligonucleotide primers corresponding to the 
5' and 3' sequences of the desired portion of the gene. A 5' primer containing a 
restriction site, a Kozak sequence, an AUG start codon, and nucleotides of the 5' coding 
20 region of the E. faecalis polypeptide is synthesized and used. A 3 1 primer, containing a 
restriction site, stop codon, and nucleotides complementary to the 3' coding sequence of 
the E. faecalis polypeptides is synthesized and used. The amplified fragment is digested 
with the restriction endonucleases and then purified again on a 1% agarose gel. The 
isolated fragment and the dephosphorylated vector are then ligated with T4 DNA ligase. 
25 E. coli HB101 or XL-1 Blue cells are then transformed and bacteria are identified that 
contain the fragment inserted into plasmid pC4 using, for instance, restriction enzyme 
analysis. 

Chinese hamster ovary cells lacking an active DHFR gene are used for 
transfection. Five (ig of the expression plasmid pC4 is cotransfected with 0.5 jig of the 

30 plasmid pSVneo using a lipid-mediated transfection agent such as Lipofectin™ or 
LipofectAMINE.™ (LifeTechnologies Gaithersburg, MD). The plasmid pSV2-neo 
contains a dominant selectable marker, the neo gene from Tn5 encoding an enzyme that 
confers resistance to a group of antibiotics including G418. The cells are seeded in alpha 
minus MEM supplemented with 1 mg/ml G418. After 2 days, the cells are trypsinized and 

35 seeded in hybridoma cloning plates (Greiner, Germany) in alpha minus MEM 

supplemented with 10, 25, or 50 ng/ml of methotrexate plus 1 mg/ml G418. After about 
10-14 days single clones are trypsinized and then seeded in 6-well petri dishes or 10 ml 
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flasks using different concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 
nM, 800 nM). Clones growing at the highest concentrations of methotrexate are then 
transferred to new 6-well plates containing even higher concentrations of methotrexate 
(1 fiM, 2 \iM, 5 |iM, 10 mM, 20 mM). The same procedure is repeated until clones are 
5 obtained which grow at a concentration of 100-200 \iM. Expression of the desired gene 
product is analyzed, for instance, by SDS-PAGE and Western blot or by reversed phase 
HPLC analysis. 

9. Quantitative Murine Soft Tissue Infection Model for 
10 £. faecalis 

Compositions of the present invention, including polypeptides and peptides, are 
assayed for their ability to function as vaccines or to enhance/stimulate an immune 
response to a bacterial species (e.g., E t faecalis) using the following quantitative murine 
soft tissue infection model Mice (e.g., NIH Swiss female mice, approximately 7 weeks 

15 old) are first treated with a biologically protective effective amount, or immune 

enhancing/stimulating effective amount of a composition of the present invention using 
methods known in the art, such as those discussed above. See,e.g., Harlow et al., 
ANTIBODIES: A LABORATORY MANUAL, (Cold Spring Harbor Laboratory Press, 2nd 
ed. 1988). An example of an appropriate starting dose is 20ug per animal. 

20 The desired bacterial species used to challenge the mice, such as E. faecalis, is 

grown as an overnight culture. The culture is diluted to a concentration of 5 X 10 8 
cfu/ml, in an appropriate media, mixed well, serially diluted, and titered. The desired 
doses are further diliuted 1:2 with sterilized Cytodex 3 microcarrier beads preswollen in 
sterile PBS (3g/100ml). Mice are anesthetize briefly until docile, but still mobile and 

25 injected with 0.2 ml of the Cytodex 3 bead/bacterial mixture into each animal 

subcutaneously in the inguinal region. After four days, counting the day of injection as 
day one, mice are sacrificed and the contents of the abscess is excised and placed in a 1 5 
ml conical tube containing 1 .0ml of sterile PBS. The contents of the abscess is then 
enzymatically treated and plated as follows. 

30 The abscess is first disrupted by vortexing with sterilized glass beads placed in the 

tubes. 3.0mls of prepared enzyme mixture (1.0ml Collagenase D (4.0 mg/ml), 1.0ml 
Trypsin (6.0 mg/ml) and 8.0 mis PBS) is then added to each tube followed by a 20 min. 
incubation at 37C. The solution is then centrifuged and the supernatant drawn off. 0.5 
ml dH20 is then added and the tubes are vortexed and then incubated for 10 min. at room 

35 temperature. 0.5 ml media is then added and samples are serially diluted and plated onto 
agar plates, and grown overnight at 37C. Plates with distinct and separate colonies are 
then counted, compared to positive and negative control samples, and quantified. The 
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method can be used to identify composition and determine appropriate and effective 
doses for humans and other animals by comparing the effective doses of compositions of 
the present invention with compositions known in the art to be effective in both mice 
and humans. Doses for the effective treatment of humans and other animals, using 
5 compositions of the present invention, are extrapolated using the data from the above 
experiments of mice. It is appreciated that further studies in humans and other animals 
may be needed to determine the most effective doses using methods of clinical practice 
known in the art. 

10 10. Murine Systemic Neutropenic Model for £. faecalis Infection 

Compositions of the present invention, including polypeptides and peptides, are 
assayed for their ability to function as vaccines or to enhance/stimulate an immune 
response to a bacterial species (e.g., E. faecalis) using the following qualitative murine 
systemic neutropenic model. Mice (e.g., NIH Swiss female mice, approximately 7 weeks 

15 oid) arc first treated with a biologically protective effective amount, or immune 

enhancing/stimulating effective amount of a composition of the present invention using 
methods known in the art, such as those discussed above. See,e.g., Harlow et al., 
ANTIBODIES: A LABORATORY MANUAL, (Cold Spring Harbor Laboratory Press, 2nd 
ed. 1988). An example of an appropriate starting dose is 20ug per animal. 

20 Mice are then injected with 250 - 300 mg/kg cyclophosphamide intraperitonially. 

Counting the day of CP. injection as day one, the mice are left untreated for 5 days to 
begin recovery of PMNL'S. 

The desired bacterial species used to challenge the mice, such as E. faecalis, is 
grown as an overnight culture. The culture is diluted to a concentration of 5 X 10 8 

25 cfu/ml, in an appropriate media, mixed well, serially diluted, and titered. The desired 
doses are further diliuted 1:2 in 4% Brewer's yeast in media. 
Mice are injected with the bacteria/brewer's yeast challenge intraperitonially. The 
Brewer's yeast solution alone is used as a control. The mice are then monitered twice 
daily for the first week following challenge, and once a day for the next week to ascertain 

30 morbidity and mortality. Mice remaining at the end of the experiment are sacrificed. 
The method can be used to identify compositions and determine appropriate and 
effective doses for humans and other animals by comparing the effective doses of 
compositions of the present invention with compositions known in the art to be 
effective in both mice and humans. Doses for the effective treatment of humans and 

35 other animals, using compositions of the present invention, are extrapolated using the 

data from the above experiments of mice. It is appreciated that further studies in humans 
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and other animals may be needed to determine the most effective doses using methods of 
clinical practice known in the art. 

The disclosure of all publications (including patents, patent applications, journal 
articles, laboratory manuals, books, or other documents) cited herein are hereby 
5 incorporated by reference in their entireties. 

The present invention is not to be limited in scope by the specific embodiments 
described herein, which are intended as single illustrations of individual aspects of the 
invention. Functionally equivalent methods and components are within the scope of the 
invention, in addition to those shown and described herein and will become apparant to 
10 those skilled in the art from the foregoing description and accompanying drawings. Such 
modifications are intended to fall within the scope of the appended claims. 
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Table 2: E, faecalis - Putative coding regions of novel proteins similar to known proteins 
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PAGES 280 TO 2076, WHICH ARE THE COMPLETE SEQUENCE 
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DESCRIPTION, CLAIMS, ABSTRACT & DRAWINGS. 
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(i) APPLICANT: Charles Kunsch 

Patrick J, Dillon 
Steven C. Barash 

(ii) TITLE OF INVENTION: Enterococcus faecialis Polynucleotides and 

Polypeptides 

(iii) NUMBER OF SEQUENCES: 982 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences/ Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: USA 

(F) ZIP: 20850 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: herewith 
• (C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 
(A)' NAME: A. Anders Brookes 
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(B) REGISTRATION NUMBER: 36,373 

(CJ REFERENCE"/ DOCKET NUMBER: PB369PCT 



(vi) TELECOMMUNICATION INFORMATION: 

(A ) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH:- 4315 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<xi> SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
AAGTCAATCA CTTGCAAGTC GTTTTCTGTC ATATGGCCGA CTAGGTGAG, GAGAGCAGGA 
ATATAGCCAC CAGGGAAAAT ATAAGGATTA «<»»c™ 

CGACTGATCC CATGAATCAA CGCCGTACCT TTAGGCGCTA AATTCCGCTG AACGACATCA 

~~^Ar* TrTTCAAACA TCCCAACACT CGTAATATGG 
AAATATTCAT GTAGATTTTC CGCACCGACA TGTTCAAACn 

TCAAAAGACT CTCCTTTTAA ATCACGATAA TCCATCAATT TGACAGTCAT TCGATCTTGT 
AGATCTTC T T TTTCTATAAT ATGGCGAATA TGATGAAATT GCTCTTCACT TAATGTAATC 
CCAGTTGCTT TGGCTCCATA TTCTTTCACC GCAGTTAAAA TTAACGTGCC CCAGCCGCAG 
CCAATATCCA GTAAAGTGTC GCCCTCTTTG ATAAACAATT TATCTAAAAT ATGATGAACT 
TTAT TCACTT GCGCTTGTTC TAATGTATCT TCAGGCGTTT TAAAATAAGC ACATGAATAC 
GTCATTGTTT GGTCAAGCCA TTTTTTGTAA AAATCATTTC GTAGATCGTA ATGGCTGTGA 
ATATCCTCTT GCGAACGTTT TTTTGAATGA CTTTCTTTAG GAAGGCATTT AATAAATTTA 
GCATTGTGTA AAAAGCTATG CTTTTGGTTA TACACATCAT AAATCAG.GC TTGGATATCG 
CCTTCGATTT 'cAATTTTGCG ATCCATGTAG GCTCCCCTA AAGTTAACGA AGCGTTATTC 
AGTAAATCCT TCACAGGAAT TTTTTCATTG AATACAATTT TAAAAACCGG ATCGCGCGAG 
CCTTGCCCAT ACTCTTTGAC GGTACGATCC CAGTATGTGA CTTGTGTCTT TTTTGAAAAA 
GACCATTTAA ACAGTTGACT GTACGTTTCT TTTTCTAACA TTGCATTCCC TCCATTAAAT 
ACGA^GAA GCGAAAACAA AAAGAAGTCG CTTTCCGGTA GTTCGTCAAA AGAAAGACCA 
CAGTCCGTTC- TAAACTGAAG CACAGAAAAG TTATCAGGCG TTCTATGTTC CGCTTCTTTT 
TTTGCAATTA ' CAGTTCTATT C7ACTCCTCT TTTAAAAATT TGAACATTCT TTTAACGTAA 
TACCTACTAT TGTTATTCTT TATCACAAAA AAACTAGAGC CAGTCCTTGA CAGACTCCTC 
TAGTTCTAAA TATTATGCTT TCTTACGCAT CCGTTGTTCC GCA7GAGTGT AAGCGCCATG 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 



A. The indications made below relate to the microorganism referred to in the description 
on page 8 , line 27 



B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet r-] 

Name of depositary institution American Type Culture Collection 



Address of depositary institution (including postal code and country) 

10801 University Boulevard 
Manasas, Virginia 201 10-2209 
United States of America 



Date of deposit May 2, 1 997 



Accession Number 55969 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 

The indications listed below will be submitted to the International Bureau later {specify the general nature of the indications, e.g., "Accession 
Number of Deposit^ 



For receiving Office use only 
"srTeet was received with the international application 



Authorized offjcei 



— For International Bureau use only 

This sheet was received by the international Bureau on: 



Authorized officer 
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What Is Claimed Is: 



1. Computer readable medium having recorded thereon the nucleotide sequence 
depicted in SEQ ID NOS: 1-982, a representative fragment thereof or a nucleotide 

5 sequence at least 95% identical to a nucleotide sequence depicted in SEQ ID NOS: 1-982. 

2. The computer readable medium of claim 1 having recorded thereon any one 
of the fragments of SEQ ID NOS: 1-982 depicted in Tables 2 and 3 or a degenerate 
variant thereof. 

10 

3. The computer readable medium of claim 1, wherein said medium is selected 
from the group consisting of a floppy disc, a hard disc, random access memory (RAM), 
read only memory (ROM), and CD-ROM. 

15 4. The computer readable medium of claim 3, wherein said medium is selected 

from the group consisting of a floppy disc, a hard disc, random access memory (RAM), 
read only memory (ROM), and CD-ROM. 



5. A computer-based system for identifying fragments of the Enterococcus 
20 faecalis genome of commercial importance comprising the following elements: 

a) a data storage means comprising the nucleotide sequence of SEQ ID NOS:l- 
982, a representative fragment thereof, or a nucleotide sequence at least 95% identical to a 
nucleotide sequence of SEQ ID NOS: 1-982; 

b) search means for comparing a target sequence to the nucleotide sequence of 
25 the data storage means of step (a) to identify homologous sequence(s), and 

c) retrieval means for obtaining said homologous sequence(s) of step (b). 



6. A method for identifying commercially important nucleic acid fragments of 
the Enterococcus faecalis genome comprising the step of comparing a database 

30 comprising the nucleotide sequences depicted in SEQ ID NOS: 1-982, a representative 

fragment thereof, or a nucleotide sequence at least 95% identical to a nucleotide sequence 
of SEQ ID NOS: 1-982 with a target sequence to obtain a nucleic acid molecule comprised 
of a complementary nucleotide sequence to said target sequence, wherein said target 
sequence is not randomly selected. 

35 

7. A method for identifying an expression modulating fragment of Enterococcus 
faecalis genome comprising the step of comparing a database comprising the nucleotide 
sequences depicted in SEQ ID NOS: 1-982, a representative fragment thereof, or a 
nucleotide sequence at least 95% identical to the nucleotide sequence of SEQ ID NOS:l- 
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982 with a target sequence to obtain a nucleic acid molecule comprised of a 
complementary nucleotide sequence to said target sequence, wherein said target sequence 
comprises sequences known to regulate gene expression. 

5 8. An isolated protein-encoding nucleic acid fragment of the Enterococcus 

faecalis genome, wherein said fragment consists of the nucleotide sequence of any one of 
the fragments of SEQ ID NOS: 1-982 depicted in Tables 2 and 3, or a degenerate variant 
thereof. 

10 9. A vector comprising any one of the fragments of the Enterococcus faecalis 

genome of claim 8. 

10. An isolated fragment of the Enterococcus faecalis genome, wherein said 
fragment modulates the expression of an operably linked open reading frame, wherein 

15 said fragment consists of the nucleotide sequence from about 10 to 200 bases in length 
which is 5' to any one of the open reading of claim 8. 

11. A vector comprising any one of the fragments of the Enterococcus faecalis 
genome of claim 8. 

20 

12. An organism which has been altered to contain any one of the fragments of 
the Enterococcus faecalis genome of claim 8. 

13. An organism which has been altered to contain any one of the fragments of 
25 the Enterococcus faecalis genome of claim 10. 

14. A method for regulating the expression of a nucleic acid molecule 
comprising the step of covalently attaching to said nucleic acid molecule to a a nucleic 
acid molecule of claim 10. 

30 

15. An isolated polypeptide encoded by any of the fragments of the 
Enterococcus faecalis genome of claim 8. 

16. An isolated polynucleotide molecule encoding any one of the polypeptides 
35 of claim 15. 

17. An antibody which selectively binds to any one of the polypeptides of claim 

15. 
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18. A method for producing a polypeptide in a host cell comprising the steps of: 

a) incubating a host containing a heterologous nucleic acid molecule whose 
nucleotide sequence consists of any one of the fragments of the Enterococcus faecalis 

5 genome of claim 8, under conditions where said heterologous nucleic acid molecule is 
expressed to produce said protein, and 

b) isolating said protein. 
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(57) Abstract 
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Box I Observation* where certain claims were found unsearchable (Continuation of item 1 of first sheet) 

This International Search Report has not been established in respect of certain claims under Article 1 7(2)(a) for the following reasons: 



Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 

See Remark 



2. Q Claims Nos.: 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 



3. | 1 Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 

Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple inventions in this international application, as follows: 



1 . I As all required additional search fees were timely paid by the applicant, this International Search Report covers ail 
1 — 1 searchable claims. 

2 * CZ1 4,1 searcnable daim3 could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
' — 1 covers only those claims for which fees were paid, specifically claims Nos.: 



4 ' CK1 Ho reo . uired additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 

1-7, see subject 1 



Remark on Protest Q The additional search fees were accompanied by the applicant's protest. 

| | No protest accompanied the payment of additional search fees. 
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This International Searching Authority found multiple (arouDS of) 
inventions in this international application, as follow?™ 

1. Claims: 1-7 

Computer readable medium having recorded thereon the 
nucleotide sequences depicted in SEQ ID no"? 1-982, a 
Uasfg'SldStt^fr 1 ther ?° f 0r a nucl?oiiJe sequence at 
ID ms l 1 V n " cle 2 t1de sequence depicted in SEQ 
f? a ™tl"«? ji, computer-based system for identifying 
fragments of the Enterococcus faecal is genome of 
conmercial importance comprising: a) a data storage means 
C0 °ZHnS 9 a S ?H n ; cle0tide »«n*nca ») 5 b) sS3 B iSJ for 
SS^HFJniE? 1 se " uen . ce t0 nucleotide sequences of 
2Su£2rS 12 "fans af step (a) to identify homologous 
sequence(s), and c) retrieval means for obtaining said — 

EX^JfJL.J l 1 * 30 ^?" 1 nucleic acid fragments of the 
Enterococcus faecal is genome comprising the step of 

3STJ!J!J atabase C0 »P ri ? in 9 sa1d nucleotide sequence(s) 

SSrf.iST* 1 sequen « 10 obtain a nucleic acid molecule 
comprised of a complementary nucleotide sequence to said 
target sequence wherein said target sequence is not 

™3SI?n^ e ^f ted; + a "PS" for identifying an expression 
".?, Tr a 9 ments °f the Enterococcus faecal is genome 
SJw hE ^ he Step /°t c ? m P arin 9 a database comprising said 

EHf?! 1 ^5 e 2!? nce , (s) Wlth - a target ^uence to obtain a 
nucleic acid molecule comprised of a complementary 

tarltt «n„o^ enCe t0 said target ^quence, wherein said 
explosion com P ris es sequences known to regulate gene 



2. Claims: (8-18) partially 

Jn + i^I^!l P * 0te1 ?: encodin5 nucl eic acid fragment of the 

5£fJ5! o* S ^ aeCal ] S 2 e C ome » wherein said fragment 
consists of the nucleotide sequence of any one of the 
fragments of SEQ ID no.l depicted in Tables 2 and 3, or a 
degenerate variant thereof; a vector comprising any one of 

SSlIlSrSii^* 86 ! lu n0 r A depicted ^ Tables 2 and 3; an 
JSilK .IK 9 ? nt of t ne Enterococcus faecal is genome, 
SSS? a - d u fragment modulates the expression of an 
2n3il£ ll n +t ° pe " rea 5 ing frame » wnere in said fragment 
SIII * ? *5S n "? le °? lde . sequence from about 19 to 20Q 

Sill. Sf S8 t 5n whic ? I s to . any one of tne °Pen reading 
frames of SEQ ID no.l depicted in Tables 2 and 3 or a 

SrJ«?Jn n? nant ? hereof t a meth0d for regulating the 
SSIwiS *Ll [I- Cle l c ac -2 ""^cule comprising the step of 
£hh S?l?.,?i tachln9 ^° sa I d H ucleic acid nwlecule a nucleic 
Hit S e nn°H S1St,n ? of tne nucleotide sequence from 
SIS. J* cci°?n base f 5 *° any one of th e opiln reading 
frames of SEQ ID no.l depicted in Tables 2 and 3 or a 

£ 9 5£ ra J£ variant thereof; an isolated polypeptide encoded 
by any one of the fragments of SEQ ID no.l depicted in Table 
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2 and 3; an antibody which selectively binds to any one of 
said polypeptides, a method for producing a polypeptide in a 
host cell comprising a) incubating a host containing a 
heterologous nucleic acid molecule whose nucleotide sequence 
consists of any one of the fragments of SEQ ID no.l depicted 
in Table 2 and 3, under conditions where said 
heterologous nucleic acid molecule is expressed to produce 
said protein, and b) isolating said protein; 



3-983. Claims: (8-18) partially 

Idem as subject 2 but limited to each of the sequences 
of SEQ ID no. 2 to 982, i.e. invention 3 is limited to the 
fragments of SEQ ID no. 2 depicted in Tables 2 and 3, 
invention 4 is limited to the fragments of SEQ ID no. 3 
depicted in Tables 2 and 3, and so on. 

For the sake of conciseness, the second subject matter is 
explicitly defined, the other subject matters are defined b 
analogy hereto. 



REMARK: 

Although claims 1-4 could, at. least partially, be considered as a mere 
presentation of information, Rule 39.1 (v) PCT, and claims 5-7 at least 
partially as a program for computers (Rule 39.l(v1) PCT), the search has 
been carried out as far as possible 1n our systematic documentation. 
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